question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I have successfully initiated Vizier Job via CloudTuner, but it failed.

I have looked into the logs, but there was no errors occurred, and the training was successfully done. Could you take a look what happened? The logs should be read from bottom to top.

jsonPayload.message
--
Job failed.
Finished tearing down training program.
2022/08/13 04:24:30 No id provided.
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.917382 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.917122 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.783987 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.783725 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.573800 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.573541 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.098982 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.098610 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
I0813 04:24:26.905468 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.905363 139812382340928 model.py:33] Non-trainable params: 23,587,712
I0813 04:24:26.905245 139812382340928 model.py:33] Trainable params: 20,490
I0813 04:24:26.905140 139812382340928 model.py:33] Total params: 23,608,202
I0813 04:24:26.900732 139812382340928 model.py:33] =================================================================
I0813 04:24:26.900615 139812382340928 model.py:33]
I0813 04:24:26.900457 139812382340928 model.py:33]  dense (Dense)               (None, 10)                20490
I0813 04:24:26.900074 139812382340928 model.py:33]
I0813 04:24:26.899939 139812382340928 model.py:33]  dropout (Dropout)           (None, 2048)              0
I0813 04:24:26.899660 139812382340928 model.py:33]
I0813 04:24:26.899552 139812382340928 model.py:33]  resnet50 (Functional)       (None, 2048)              23587712
I0813 04:24:26.895043 139812382340928 model.py:33] =================================================================
I0813 04:24:26.894924 139812382340928 model.py:33]  Layer (type)                Output Shape              Param #
I0813 04:24:26.894763 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.894547 139812382340928 model.py:33] Model: "sequential"
8192/94765736 [..............................] - ETA: 0s  5955584/94765736 [>.............................] - ETA: 0s 14000128/94765736 [===>..........................] - ETA: 0s 20971520/94765736 [=====>........................] - ETA: 0s 28442624/94765736 [========>.....................] - ETA: 0s 36356096/94765736 [==========>...................] - ETA: 0s 44326912/94765736 [=============>................] - ETA: 0s 52133888/94765736 [===============>..............] - ETA: 0s 60121088/94765736 [==================>...........] - ETA: 0s 67960832/94765736 [====================>.........] - ETA: 0s 75710464/94765736 [======================>.......] - ETA: 0s 83501056/94765736 [=========================>....] - ETA: 0s 91258880/94765736 [===========================>..] - ETA: 0s 94765736/94765736 [==============================] - 1s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
 
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
 
2022-08-13 04:24:23.295037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10807 MB memory:  -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7
2022-08-13 04:24:23.294182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:23.293289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:23.292287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.735514: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.734559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.733457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-13 04:24:22.732728: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
I0813 04:24:21.739562 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
Load existing study...
I0813 04:24:21.737804 139812382340928 tuner.py:197] Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
Load existing study...
INFO:tensorflow:Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
I0813 04:24:21.696875 139812382340928 tuner.py:197] {'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
INFO:tensorflow:{'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
I0813 04:24:21.171569 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
I0813 04:24:21.171575 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
 
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170827 139812382340928 google_api_client.py:185]
 
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170828 139812382340928 google_api_client.py:185]
W0813 04:24:21.157318 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157322 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157067 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157056 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.156749 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.156746 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
I0813 04:24:21.156335 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156317 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:20.723299 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.723111 139812382340928 executor.py:212] Binding chief oracle server at: 0.0.0.0:2222
I0813 04:24:20.722659 139812382340928 executor.py:200] chief_oracle() starting...
I0813 04:24:20.722256 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.722024 139812382340928 executor.py:275] Setting KERASTUNER_TUNER_ID with tfx-tuner-master-0
I0813 04:24:20.721865 139812382340928 executor.py:267] Oracle chief is known to be at: cmle-training-master-afa651e2fc-0:2222
I0813 04:24:20.720906 139812382340928 executor.py:233] Chief oracle started at PID: 16
I0813 04:24:20.710414 139812382340928 run_executor.py:155] Starting executor
I0813 04:24:20.709932 139812382340928 executor.py:332] Tuner ID is: tfx-tuner-master-0
I0813 04:24:20.709692 139812382340928 executor.py:300] Cluster spec initalized with: {'cluster': {'master': ['cmle-training-master-afa651e2fc-0:2222'], 'worker': ['cmle-training-worker-afa651e2fc-0:2222', 'cmle-training-worker-afa651e2fc-1:2222']}, 'environment': 'cloud', 'task': {'type': 'master', 'index': 0}, 'job': '{\n  "scale_tier": "CUSTOM",\n  "master_type": "n1-standard-4",\n  "worker_type": "n1-standard-4",\n  "worker_count": "2",\n  "region": "us-central1",\n  "master_config": {\n    "accelerator_config": {\n      "count": "1",\n      "type": "NVIDIA_TESLA_K80"\n    },\n    "image_uri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test",\n    "container_command": ["python", "-m", "tfx.scripts.run_executor", "--executor_class_path", "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor", "--inputs", "{\\"transform_graph\\": [{\\"artifact\\": {\\"id\\": \\"3040439057790690801\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph\\", \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"TransformGraph\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"TransformGraph\\"}], \\"examples\\": [{\\"artifact\\": {\\"id\\": \\"6958007971664455536\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples\\", \\"properties\\": {\\"split_names\\": {\\"string_value\\": \\"[\\\\\\"eval\\\\\\", \\\\\\"train\\\\\\"]\\"}}, \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"Examples\\", \\"properties\\": {\\"span\\": \\"INT\\", \\"split_names\\": \\"STRING\\", \\"version\\": \\"INT\\"}, \\"base_type\\": \\"DATASET\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"Examples\\"}]}", "--outputs", "{\\"best_hyperparameters\\": [{\\"artifact\\": {\\"id\\": \\"3312416091851715625\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters\\"}, \\"artifact_type\\": {\\"name\\": \\"HyperParameters\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"HyperParameters\\"}]}", "--exec-properties", "{\\"custom_config\\": \\"{\\\\\\"ai_platform_tuning_args\\\\\\": {\\\\\\"masterConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\"}, \\\\\\"masterType\\\\\\": \\\\\\"n1-standard-4\\\\\\", \\\\\\"project\\\\\\": \\\\\\"gcp-ml-172005\\\\\\", \\\\\\"region\\\\\\": \\\\\\"us-central1\\\\\\", \\\\\\"scaleTier\\\\\\": \\\\\\"CUSTOM\\\\\\", \\\\\\"serviceAccount\\\\\\": \\\\\\"vizier@gcp-ml-172005.iam.gserviceaccount.com\\\\\\", \\\\\\"workerConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\"}, \\\\\\"workerCount\\\\\\": 3, \\\\\\"workerType\\\\\\": \\\\\\"n1-standard-4\\\\\\"}, \\\\\\"remote_trials_working_dir\\\\\\": \\\\\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials\\\\\\"}\\", \\"eval_args\\": \\"{\\\\n  \\\\\\"num_steps\\\\\\": 4\\\\n}\\", \\"train_args\\": \\"{\\\\n  \\\\\\"num_steps\\\\\\": 160\\\\n}\\", \\"tune_args\\": \\"{\\\\n  \\\\\\"num_parallel_trials\\\\\\": 3\\\\n}\\", \\"tuner_fn\\": \\"models.model.cloud_tuner_fn\\"}"]\n  },\n  "worker_config": {\n    "accelerator_config": {\n      "count": "1",\n      "type": "NVIDIA_TESLA_K80"\n    },\n    "image_uri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"\n  },\n  "service_account": "vizier@gcp-ml-172005.iam.gserviceaccount.com"\n}'}
I0813 04:24:20.709398 139812382340928 executor.py:292] Initializing cluster spec...
 
 
I0813 04:24:16.823370 139812382340928 executor.py:43] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.796857 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.796637 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.795921 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.795674 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.784358 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.783092 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.782919 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.781281 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.780695 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.780493 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.779530 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.779021 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.415254 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.415065 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.414037 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.413864 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412977 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412753 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412003 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.411813 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.410855 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.410645 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.409852 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.409629 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.408295 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.408134 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.407290 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.407044 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.277025 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.276487 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.275387 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.274608 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.274452 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.211828 139812382340928 model_util.py:68] struct2tensor is not available: No module named 'struct2tensor'
I0813 04:24:16.211477 139812382340928 model_util.py:63] tensorflow_decision_forests is not available: No module named 'tensorflow_decision_forests'
I0813 04:24:16.211113 139812382340928 model_util.py:58] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.210595 139812382340928 model_util.py:53] tensorflow_ranking is not available: No module named 'tensorflow_ranking'
I0813 04:24:16.210203 139812382340928 model_util.py:44] imported tensorflow_io
I0813 04:24:15.824912 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:15.824463 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
)]}, exec_properties: {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'}
, artifact_type: name: "HyperParameters"
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters"
)]}, outputs: {'best_hyperparameters': [Artifact(artifact: id: 3312416091851715625
base_type: DATASET
}
value: INT
key: "version"
properties {
}
value: STRING
key: "split_names"
properties {
}
value: INT
key: "span"
properties {
, artifact_type: name: "Examples"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
}
}
string_value: "[\"eval\", \"train\"]"
value {
key: "split_names"
properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples"
)], 'examples': [Artifact(artifact: id: 6958007971664455536
, artifact_type: name: "TransformGraph"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph"
I0813 04:24:15.720419 139812382340928 run_executor.py:141] Executor tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor do: inputs: {'transform_graph': [Artifact(artifact: id: 3040439057790690801
2022-08-13 04:24:15.695750: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.694653: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.510332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022/08/13 04:24:10 No id provided.
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
 
 
 
Job tfx_tuner_20220813041519 is queued.
 
Job creation request has been successfully validated.


jsonPayload.message
Job failed.
Finished tearing down training program.
2022/08/13 04:25:56 No id provided.
2022/08/13 04:25:46 No id provided.
2022/08/13 04:25:01 No id provided.
2022/08/13 04:24:54 No id provided.
2022/08/13 04:24:30 No id provided.
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.917382 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.917122 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.783987 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.783725 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.573800 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.573541 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.098982 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.098610 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
I0813 04:24:26.905468 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.905363 139812382340928 model.py:33] Non-trainable params: 23,587,712
I0813 04:24:26.905245 139812382340928 model.py:33] Trainable params: 20,490
I0813 04:24:26.905140 139812382340928 model.py:33] Total params: 23,608,202
I0813 04:24:26.900732 139812382340928 model.py:33] =================================================================
I0813 04:24:26.900615 139812382340928 model.py:33]
I0813 04:24:26.900457 139812382340928 model.py:33]  dense (Dense)               (None, 10)                20490
I0813 04:24:26.900074 139812382340928 model.py:33]
I0813 04:24:26.899939 139812382340928 model.py:33]  dropout (Dropout)           (None, 2048)              0
I0813 04:24:26.899660 139812382340928 model.py:33]
I0813 04:24:26.899552 139812382340928 model.py:33]  resnet50 (Functional)       (None, 2048)              23587712
I0813 04:24:26.895043 139812382340928 model.py:33] =================================================================
I0813 04:24:26.894924 139812382340928 model.py:33]  Layer (type)                Output Shape              Param #
I0813 04:24:26.894763 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.894547 139812382340928 model.py:33] Model: "sequential"
"8192/94765736 [..............................] - ETA: 0s
 5955584/94765736 [>.............................] - ETA: 0s
14000128/94765736 [===>..........................] - ETA: 0s
20971520/94765736 [=====>........................] - ETA: 0s
28442624/94765736 [========>.....................] - ETA: 0s
36356096/94765736 [==========>...................] - ETA: 0s
44326912/94765736 [=============>................] - ETA: 0s
52133888/94765736 [===============>..............] - ETA: 0s
60121088/94765736 [==================>...........] - ETA: 0s
67960832/94765736 [====================>.........] - ETA: 0s
75710464/94765736 [======================>.......] - ETA: 0s
83501056/94765736 [=========================>....] - ETA: 0s
91258880/94765736 [===========================>..] - ETA: 0s
94765736/94765736 [==============================] - 1s 0us/step"
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5

tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of

2022-08-13 04:24:23.295037: I tensorflow/core/common_runtime/gpu/[gpu_device.cc:1532](http://gpu_device.cc:1532/)] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10807 MB memory:  -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7
2022-08-13 04:24:23.294182: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-13 04:24:22.732728: I tensorflow/core/platform/[cpu_feature_guard.cc:193](http://cpu_feature_guard.cc:193/)] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
I0813 04:24:21.739562 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
Load existing study...
I0813 04:24:21.737804 139812382340928 tuner.py:197] Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
Load existing study...
INFO:tensorflow:Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
I0813 04:24:21.696875 139812382340928 tuner.py:197] {'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
INFO:tensorflow:{'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
I0813 04:24:21.171569 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
I0813 04:24:21.171575 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.

tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170827 139812382340928 google_api_client.py:185]

tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170828 139812382340928 google_api_client.py:185]
W0813 04:24:21.157318 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
I0813 04:24:21.156335 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156317 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:20.723299 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.723111 139812382340928 executor.py:212] Binding chief oracle server at: 0.0.0.0:2222
I0813 04:24:20.722659 139812382340928 executor.py:200] chief_oracle() starting...
I0813 04:24:20.722256 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.722024 139812382340928 executor.py:275] Setting KERASTUNER_TUNER_ID with tfx-tuner-master-0
I0813 04:24:20.721865 139812382340928 executor.py:267] Oracle chief is known to be at: cmle-training-master-afa651e2fc-0:2222
I0813 04:24:20.720906 139812382340928 executor.py:233] Chief oracle started at PID: 16
I0813 04:24:20.710414 139812382340928 run_executor.py:155] Starting executor
I0813 04:24:20.709932 139812382340928 executor.py:332] Tuner ID is: tfx-tuner-master-0
I0813 04:24:20.709692 139812382340928 executor.py:300] Cluster spec initalized with: {'cluster': {'master': ['cmle-training-master-afa651e2fc-0:2222'], 'worker': ['cmle-training-worker-afa651e2fc-0:2222', 'cmle-training-worker-afa651e2fc-1:2222']}, 'environment': 'cloud', 'task': {'type': 'master', 'index': 0}, 'job': '{\n  "scale_tier": "CUSTOM",\n  "master_type": "n1-standard-4",\n  "worker_type": "n1-standard-4",\n  "worker_count": "2",\n  "region": "us-central1",\n  "master_config": {\n    "accelerator_config": {\n      "count": "1",\n      "type": "NVIDIA_TESLA_K80"\n    },\n    "image_uri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)",\n    "container_command": ["python", "-m", "tfx.scripts.run_executor", "--executor_class_path", "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor", "--inputs", "{\\"transform_graph\\": [{\\"artifact\\": {\\"id\\": \\"3040439057790690801\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph\\", \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"TransformGraph\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"TransformGraph\\"}], \\"examples\\": [{\\"artifact\\": {\\"id\\": \\"6958007971664455536\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples\\", \\"properties\\": {\\"split_names\\": {\\"string_value\\": \\"[\\\\\\"eval\\\\\\", \\\\\\"train\\\\\\"]\\"}}, \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"Examples\\", \\"properties\\": {\\"span\\": \\"INT\\", \\"split_names\\": \\"STRING\\", \\"version\\": \\"INT\\"}, \\"base_type\\": \\"DATASET\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"Examples\\"}]}", "--outputs", "{\\"best_hyperparameters\\": [{\\"artifact\\": {\\"id\\": \\"3312416091851715625\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters\\"}, \\"artifact_type\\": {\\"name\\": \\"HyperParameters\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"HyperParameters\\"}]}", "--exec-properties", "{\\"custom_config\\": \\"{\\\\\\"ai_platform_tuning_args\\\\\\": {\\\\\\"masterConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test%5C%5C%5C%5C%5C%5C)"}, \\\\\\"masterType\\\\\\": \\\\\\"n1-standard-4\\\\\\", \\\\\\"project\\\\\\": \\\\\\"gcp-ml-172005\\\\\\", \\\\\\"region\\\\\\": \\\\\\"us-central1\\\\\\", \\\\\\"scaleTier\\\\\\": \\\\\\"CUSTOM\\\\\\", \\\\\\"serviceAccount\\\\\\": \\\\\\"[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)\\\\\\", \\\\\\"workerConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test%5C%5C%5C%5C%5C%5C)"}, \\\\\\"workerCount\\\\\\": 3, \\\\\\"workerType\\\\\\": \\\\\\"n1-standard-4\\\\\\"}, \\\\\\"remote_trials_working_dir\\\\\\": \\\\\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials\\\\\\"}\\", \\"eval_args\\": \\"{\\\\n  \\\\\\"num_steps\\\\\\": 4\\\\n}\\", \\"train_args\\": \\"{\\\\n  \\\\\\"num_steps\\\\\\": 160\\\\n}\\", \\"tune_args\\": \\"{\\\\n  \\\\\\"num_parallel_trials\\\\\\": 3\\\\n}\\", \\"tuner_fn\\": \\"models.model.cloud_tuner_fn\\"}"]\n  },\n  "worker_config": {\n    "accelerator_config": {\n      "count": "1",\n      "type": "NVIDIA_TESLA_K80"\n    },\n    "image_uri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"\n  },\n  "service_account": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)"\n}'}
I0813 04:24:20.709398 139812382340928 executor.py:292] Initializing cluster spec...


I0813 04:24:16.823370 139812382340928 executor.py:43] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.796857 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.211828 139812382340928 model_util.py:68] struct2tensor is not available: No module named 'struct2tensor'
I0813 04:24:16.211477 139812382340928 model_util.py:63] tensorflow_decision_forests is not available: No module named 'tensorflow_decision_forests'
I0813 04:24:16.211113 139812382340928 model_util.py:58] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.210595 139812382340928 model_util.py:53] tensorflow_ranking is not available: No module named 'tensorflow_ranking'
I0813 04:24:16.210203 139812382340928 model_util.py:44] imported tensorflow_io
I0813 04:24:15.824912 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:15.824463 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
)]}, exec_properties: {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n  "num_steps": 4\n}', 'train_args': '{\n  "num_steps": 160\n}', 'tune_args': '{\n  "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'}
, artifact_type: name: "HyperParameters"
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters"
)]}, outputs: {'best_hyperparameters': [Artifact(artifact: id: 3312416091851715625
base_type: DATASET
}
  value: INT
  key: "version"
properties {
}
  value: STRING
  key: "split_names"
properties {
}
  value: INT
  key: "span"
properties {
, artifact_type: name: "Examples"
}
  }
    }
      }
        }
string_value: "1.9.1"
        value {
        key: "__value__"
      fields {
    struct_value {
  value {
  key: "tfx_version"
custom_properties {
}
  }
    string_value: "[\"eval\", \"train\"]"
  value {
  key: "split_names"
properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples"
)], 'examples': [Artifact(artifact: id: 6958007971664455536
, artifact_type: name: "TransformGraph"
}
  }
    }
      }
        }
string_value: "1.9.1"
        value {
        key: "__value__"
      fields {
    struct_value {
  value {
  key: "tfx_version"
custom_properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph"
I0813 04:24:15.720419 139812382340928 run_executor.py:141] Executor tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor do: inputs: {'transform_graph': [Artifact(artifact: id: 3040439057790690801
2022-08-13 04:24:15.695750: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.694653: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.510332: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022/08/13 04:24:10 No id provided.
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"



Job tfx_tuner_20220813041519 is queued.

Job creation request has been successfully validated.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:24

github_iconTop GitHub Comments

1reaction
1025KBcommented, Aug 17, 2022

yep, need to add ENABLE_VERTEX_KEY & VERTEX_REGION_KEY in additional to TUNING_ARGS_KEY and REMOTE_TRIALS_WORKING_DIR_KEY

1reaction
1025KBcommented, Aug 15, 2022

On Cloud (KubeflowDagRunner + extension.Tuner) you can also just use KerasTuner, e.g., RandomSearch in your tuner_fn. I want to know if your workflow had issue on CloudTuner or other part of the the workflow

Read more comments on GitHub >

github_iconTop Results From Across the Web

CloudTuner with error "Internal error occurred for the current attempt ...
I have included CloudTuner for TFX Pipeline, but I constantly get Internal error occurred for the current attempt. in the logs from AI...
Read more >
PIP versioning error in an official hyperparameter tuning ...
This is some sort of Python versioning error but Google search finds no ... that from tensorflow_enterprise_addons.cloudtuner import optimizer_client ...
Read more >
Debugging TensorFlow Cloud Workflows
Here are some tips for fixing unexpected issues. Operation disallowed within distribution strategy scope. Error like: Creating a generator ...
Read more >
TFX 0.25.0-rc2 is Out - Google Groups
Dear TFX community, We are pleased to announce that TFX 0.25.0-rc2 is released. Please feel free to test this and report any issues...
Read more >
AWS REFRESHER QUIZ #12 - YouTube
AWS REFRESHER QUIZ #12. 92 views 2 months ago. Cloud Tuner. Cloud Tuner. 159 subscribers. Subscribe. 1. I like this. I dislike this....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found