CloudTuner error
See original GitHub issueI have successfully initiated Vizier Job via CloudTuner, but it failed.
I have looked into the logs, but there was no errors occurred, and the training was successfully done. Could you take a look what happened? The logs should be read from bottom to top.
jsonPayload.message
--
Job failed.
Finished tearing down training program.
2022/08/13 04:24:30 No id provided.
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.917382 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.917122 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.783987 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.783725 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.573800 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.573541 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.098982 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.098610 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
I0813 04:24:26.905468 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.905363 139812382340928 model.py:33] Non-trainable params: 23,587,712
I0813 04:24:26.905245 139812382340928 model.py:33] Trainable params: 20,490
I0813 04:24:26.905140 139812382340928 model.py:33] Total params: 23,608,202
I0813 04:24:26.900732 139812382340928 model.py:33] =================================================================
I0813 04:24:26.900615 139812382340928 model.py:33]
I0813 04:24:26.900457 139812382340928 model.py:33] dense (Dense) (None, 10) 20490
I0813 04:24:26.900074 139812382340928 model.py:33]
I0813 04:24:26.899939 139812382340928 model.py:33] dropout (Dropout) (None, 2048) 0
I0813 04:24:26.899660 139812382340928 model.py:33]
I0813 04:24:26.899552 139812382340928 model.py:33] resnet50 (Functional) (None, 2048) 23587712
I0813 04:24:26.895043 139812382340928 model.py:33] =================================================================
I0813 04:24:26.894924 139812382340928 model.py:33] Layer (type) Output Shape Param #
I0813 04:24:26.894763 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.894547 139812382340928 model.py:33] Model: "sequential"
8192/94765736 [..............................] - ETA: 0s 5955584/94765736 [>.............................] - ETA: 0s 14000128/94765736 [===>..........................] - ETA: 0s 20971520/94765736 [=====>........................] - ETA: 0s 28442624/94765736 [========>.....................] - ETA: 0s 36356096/94765736 [==========>...................] - ETA: 0s 44326912/94765736 [=============>................] - ETA: 0s 52133888/94765736 [===============>..............] - ETA: 0s 60121088/94765736 [==================>...........] - ETA: 0s 67960832/94765736 [====================>.........] - ETA: 0s 75710464/94765736 [======================>.......] - ETA: 0s 83501056/94765736 [=========================>....] - ETA: 0s 91258880/94765736 [===========================>..] - ETA: 0s 94765736/94765736 [==============================] - 1s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
2022-08-13 04:24:23.295037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10807 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7
2022-08-13 04:24:23.294182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:23.293289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:23.292287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.735514: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.734559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:22.733457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-13 04:24:22.732728: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
I0813 04:24:21.739562 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
Load existing study...
I0813 04:24:21.737804 139812382340928 tuner.py:197] Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
Load existing study...
INFO:tensorflow:Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
I0813 04:24:21.696875 139812382340928 tuner.py:197] {'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
INFO:tensorflow:{'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
I0813 04:24:21.171569 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
I0813 04:24:21.171575 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170827 139812382340928 google_api_client.py:185]
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170828 139812382340928 google_api_client.py:185]
W0813 04:24:21.157318 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157322 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157067 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.157056 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.156749 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
W0813 04:24:21.156746 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
I0813 04:24:21.156335 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156317 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:20.723299 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.723111 139812382340928 executor.py:212] Binding chief oracle server at: 0.0.0.0:2222
I0813 04:24:20.722659 139812382340928 executor.py:200] chief_oracle() starting...
I0813 04:24:20.722256 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.722024 139812382340928 executor.py:275] Setting KERASTUNER_TUNER_ID with tfx-tuner-master-0
I0813 04:24:20.721865 139812382340928 executor.py:267] Oracle chief is known to be at: cmle-training-master-afa651e2fc-0:2222
I0813 04:24:20.720906 139812382340928 executor.py:233] Chief oracle started at PID: 16
I0813 04:24:20.710414 139812382340928 run_executor.py:155] Starting executor
I0813 04:24:20.709932 139812382340928 executor.py:332] Tuner ID is: tfx-tuner-master-0
I0813 04:24:20.709692 139812382340928 executor.py:300] Cluster spec initalized with: {'cluster': {'master': ['cmle-training-master-afa651e2fc-0:2222'], 'worker': ['cmle-training-worker-afa651e2fc-0:2222', 'cmle-training-worker-afa651e2fc-1:2222']}, 'environment': 'cloud', 'task': {'type': 'master', 'index': 0}, 'job': '{\n "scale_tier": "CUSTOM",\n "master_type": "n1-standard-4",\n "worker_type": "n1-standard-4",\n "worker_count": "2",\n "region": "us-central1",\n "master_config": {\n "accelerator_config": {\n "count": "1",\n "type": "NVIDIA_TESLA_K80"\n },\n "image_uri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test",\n "container_command": ["python", "-m", "tfx.scripts.run_executor", "--executor_class_path", "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor", "--inputs", "{\\"transform_graph\\": [{\\"artifact\\": {\\"id\\": \\"3040439057790690801\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph\\", \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"TransformGraph\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"TransformGraph\\"}], \\"examples\\": [{\\"artifact\\": {\\"id\\": \\"6958007971664455536\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples\\", \\"properties\\": {\\"split_names\\": {\\"string_value\\": \\"[\\\\\\"eval\\\\\\", \\\\\\"train\\\\\\"]\\"}}, \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"Examples\\", \\"properties\\": {\\"span\\": \\"INT\\", \\"split_names\\": \\"STRING\\", \\"version\\": \\"INT\\"}, \\"base_type\\": \\"DATASET\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"Examples\\"}]}", "--outputs", "{\\"best_hyperparameters\\": [{\\"artifact\\": {\\"id\\": \\"3312416091851715625\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters\\"}, \\"artifact_type\\": {\\"name\\": \\"HyperParameters\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"HyperParameters\\"}]}", "--exec-properties", "{\\"custom_config\\": \\"{\\\\\\"ai_platform_tuning_args\\\\\\": {\\\\\\"masterConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\"}, \\\\\\"masterType\\\\\\": \\\\\\"n1-standard-4\\\\\\", \\\\\\"project\\\\\\": \\\\\\"gcp-ml-172005\\\\\\", \\\\\\"region\\\\\\": \\\\\\"us-central1\\\\\\", \\\\\\"scaleTier\\\\\\": \\\\\\"CUSTOM\\\\\\", \\\\\\"serviceAccount\\\\\\": \\\\\\"vizier@gcp-ml-172005.iam.gserviceaccount.com\\\\\\", \\\\\\"workerConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\"}, \\\\\\"workerCount\\\\\\": 3, \\\\\\"workerType\\\\\\": \\\\\\"n1-standard-4\\\\\\"}, \\\\\\"remote_trials_working_dir\\\\\\": \\\\\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials\\\\\\"}\\", \\"eval_args\\": \\"{\\\\n \\\\\\"num_steps\\\\\\": 4\\\\n}\\", \\"train_args\\": \\"{\\\\n \\\\\\"num_steps\\\\\\": 160\\\\n}\\", \\"tune_args\\": \\"{\\\\n \\\\\\"num_parallel_trials\\\\\\": 3\\\\n}\\", \\"tuner_fn\\": \\"models.model.cloud_tuner_fn\\"}"]\n },\n "worker_config": {\n "accelerator_config": {\n "count": "1",\n "type": "NVIDIA_TESLA_K80"\n },\n "image_uri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"\n },\n "service_account": "vizier@gcp-ml-172005.iam.gserviceaccount.com"\n}'}
I0813 04:24:20.709398 139812382340928 executor.py:292] Initializing cluster spec...
I0813 04:24:16.823370 139812382340928 executor.py:43] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.796857 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.796637 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.795921 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.795674 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.784358 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.783092 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.782919 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.781281 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.780695 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.780493 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.779530 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.779021 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.415254 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.415065 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.414037 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.413864 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412977 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412753 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.412003 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.411813 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.410855 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.410645 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.409852 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.409629 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.408295 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.408134 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.407290 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.407044 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.277025 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.276487 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.275387 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.274608 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.274452 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.211828 139812382340928 model_util.py:68] struct2tensor is not available: No module named 'struct2tensor'
I0813 04:24:16.211477 139812382340928 model_util.py:63] tensorflow_decision_forests is not available: No module named 'tensorflow_decision_forests'
I0813 04:24:16.211113 139812382340928 model_util.py:58] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.210595 139812382340928 model_util.py:53] tensorflow_ranking is not available: No module named 'tensorflow_ranking'
I0813 04:24:16.210203 139812382340928 model_util.py:44] imported tensorflow_io
I0813 04:24:15.824912 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:15.824463 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
)]}, exec_properties: {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "vizier@gcp-ml-172005.iam.gserviceaccount.com", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'}
, artifact_type: name: "HyperParameters"
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters"
)]}, outputs: {'best_hyperparameters': [Artifact(artifact: id: 3312416091851715625
base_type: DATASET
}
value: INT
key: "version"
properties {
}
value: STRING
key: "split_names"
properties {
}
value: INT
key: "span"
properties {
, artifact_type: name: "Examples"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
}
}
string_value: "[\"eval\", \"train\"]"
value {
key: "split_names"
properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples"
)], 'examples': [Artifact(artifact: id: 6958007971664455536
, artifact_type: name: "TransformGraph"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph"
I0813 04:24:15.720419 139812382340928 run_executor.py:141] Executor tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor do: inputs: {'transform_graph': [Artifact(artifact: id: 3040439057790690801
2022-08-13 04:24:15.695750: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.694653: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.510332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022/08/13 04:24:10 No id provided.
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
File system has been successfully mounted.
Mounting file system "gcsfuse"...
Opening GCS connection...
Job tfx_tuner_20220813041519 is queued.
Job creation request has been successfully validated.
jsonPayload.message
Job failed.
Finished tearing down training program.
2022/08/13 04:25:56 No id provided.
2022/08/13 04:25:46 No id provided.
2022/08/13 04:25:01 No id provided.
2022/08/13 04:24:54 No id provided.
2022/08/13 04:24:30 No id provided.
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.917382 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.917122 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.783987 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.783725 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.573800 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.573541 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
. Setting to DenseTensor.
}
size: 1
I0813 04:24:27.098982 139812382340928 tensor_representation_util.py:347] Feature label_xf has a shape dim {
. Setting to DenseTensor.
}
size: 3
dim {
}
size: 224
dim {
}
size: 224
I0813 04:24:27.098610 139812382340928 tensor_representation_util.py:347] Feature image_xf has a shape dim {
I0813 04:24:26.905468 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.905363 139812382340928 model.py:33] Non-trainable params: 23,587,712
I0813 04:24:26.905245 139812382340928 model.py:33] Trainable params: 20,490
I0813 04:24:26.905140 139812382340928 model.py:33] Total params: 23,608,202
I0813 04:24:26.900732 139812382340928 model.py:33] =================================================================
I0813 04:24:26.900615 139812382340928 model.py:33]
I0813 04:24:26.900457 139812382340928 model.py:33] dense (Dense) (None, 10) 20490
I0813 04:24:26.900074 139812382340928 model.py:33]
I0813 04:24:26.899939 139812382340928 model.py:33] dropout (Dropout) (None, 2048) 0
I0813 04:24:26.899660 139812382340928 model.py:33]
I0813 04:24:26.899552 139812382340928 model.py:33] resnet50 (Functional) (None, 2048) 23587712
I0813 04:24:26.895043 139812382340928 model.py:33] =================================================================
I0813 04:24:26.894924 139812382340928 model.py:33] Layer (type) Output Shape Param #
I0813 04:24:26.894763 139812382340928 model.py:33] _________________________________________________________________
I0813 04:24:26.894547 139812382340928 model.py:33] Model: "sequential"
"8192/94765736 [..............................] - ETA: 0s
5955584/94765736 [>.............................] - ETA: 0s
14000128/94765736 [===>..........................] - ETA: 0s
20971520/94765736 [=====>........................] - ETA: 0s
28442624/94765736 [========>.....................] - ETA: 0s
36356096/94765736 [==========>...................] - ETA: 0s
44326912/94765736 [=============>................] - ETA: 0s
52133888/94765736 [===============>..............] - ETA: 0s
60121088/94765736 [==================>...........] - ETA: 0s
67960832/94765736 [====================>.........] - ETA: 0s
75710464/94765736 [======================>.......] - ETA: 0s
83501056/94765736 [=========================>....] - ETA: 0s
91258880/94765736 [===========================>..] - ETA: 0s
94765736/94765736 [==============================] - 1s 0us/step"
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
2022-08-13 04:24:23.295037: I tensorflow/core/common_runtime/gpu/[gpu_device.cc:1532](http://gpu_device.cc:1532/)] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10807 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7
2022-08-13 04:24:23.294182: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-13 04:24:22.732728: I tensorflow/core/platform/[cpu_feature_guard.cc:193](http://cpu_feature_guard.cc:193/)] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
I0813 04:24:21.739562 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
Load existing study...
I0813 04:24:21.737804 139812382340928 tuner.py:197] Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
Load existing study...
INFO:tensorflow:Study already exists: projects/gcp-ml-172005/locations/us-central1/studies/CloudTuner_study_20220813_042421.
I0813 04:24:21.696875 139812382340928 tuner.py:197] {'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
INFO:tensorflow:{'name': 'projects/874401645461/locations/us-central1/studies/CloudTuner_study_20220813_042421', 'studyConfig': {'metrics': [{'goal': 'MAXIMIZE', 'metric': 'val_sparse_categorical_accuracy'}], 'parameters': [{'parameter': 'learning_rate', 'type': 'DISCRETE', 'discreteValueSpec': {'values': [0.001, 0.01]}}], 'automatedStoppingConfig': {'decayCurveStoppingConfig': {'useElapsedTime': True}}}, 'state': 'ACTIVE', 'createTime': '2022-08-13T04:24:21Z'}
I0813 04:24:21.171569 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
I0813 04:24:21.171575 139812382340928 google_api_client.py:132] Detected running in DL_CONTAINER environment.
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170827 139812382340928 google_api_client.py:185]
tensorflow_cloud.utils.google_api_client.optout_metrics_reporting().
to opt-out, you may do so by running
please refer to https://policies.google.com/privacy. If you wish
Cloud Services in accordance with Google privacy policy, for more information
This application reports technical and operational details of your usage of
I0813 04:24:21.170828 139812382340928 google_api_client.py:185]
W0813 04:24:21.157318 139812382340928 examples_utils.py:50] Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
I0813 04:24:21.156335 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156317 139812382340928 fn_args_utils.py:138] Evaluate on the 'eval' split when eval_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:21.156160 139812382340928 fn_args_utils.py:134] Train on the 'train' split when train_args.splits is not set.
I0813 04:24:20.723299 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.723111 139812382340928 executor.py:212] Binding chief oracle server at: 0.0.0.0:2222
I0813 04:24:20.722659 139812382340928 executor.py:200] chief_oracle() starting...
I0813 04:24:20.722256 139812382340928 udf_utils.py:48] udf_utils.get_fn {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'} 'tuner_fn'
I0813 04:24:20.722024 139812382340928 executor.py:275] Setting KERASTUNER_TUNER_ID with tfx-tuner-master-0
I0813 04:24:20.721865 139812382340928 executor.py:267] Oracle chief is known to be at: cmle-training-master-afa651e2fc-0:2222
I0813 04:24:20.720906 139812382340928 executor.py:233] Chief oracle started at PID: 16
I0813 04:24:20.710414 139812382340928 run_executor.py:155] Starting executor
I0813 04:24:20.709932 139812382340928 executor.py:332] Tuner ID is: tfx-tuner-master-0
I0813 04:24:20.709692 139812382340928 executor.py:300] Cluster spec initalized with: {'cluster': {'master': ['cmle-training-master-afa651e2fc-0:2222'], 'worker': ['cmle-training-worker-afa651e2fc-0:2222', 'cmle-training-worker-afa651e2fc-1:2222']}, 'environment': 'cloud', 'task': {'type': 'master', 'index': 0}, 'job': '{\n "scale_tier": "CUSTOM",\n "master_type": "n1-standard-4",\n "worker_type": "n1-standard-4",\n "worker_count": "2",\n "region": "us-central1",\n "master_config": {\n "accelerator_config": {\n "count": "1",\n "type": "NVIDIA_TESLA_K80"\n },\n "image_uri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)",\n "container_command": ["python", "-m", "tfx.scripts.run_executor", "--executor_class_path", "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor", "--inputs", "{\\"transform_graph\\": [{\\"artifact\\": {\\"id\\": \\"3040439057790690801\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph\\", \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"TransformGraph\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"TransformGraph\\"}], \\"examples\\": [{\\"artifact\\": {\\"id\\": \\"6958007971664455536\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples\\", \\"properties\\": {\\"split_names\\": {\\"string_value\\": \\"[\\\\\\"eval\\\\\\", \\\\\\"train\\\\\\"]\\"}}, \\"custom_properties\\": {\\"tfx_version\\": {\\"struct_value\\": {\\"__value__\\": \\"1.9.1\\"}}}}, \\"artifact_type\\": {\\"name\\": \\"Examples\\", \\"properties\\": {\\"span\\": \\"INT\\", \\"split_names\\": \\"STRING\\", \\"version\\": \\"INT\\"}, \\"base_type\\": \\"DATASET\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"Examples\\"}]}", "--outputs", "{\\"best_hyperparameters\\": [{\\"artifact\\": {\\"id\\": \\"3312416091851715625\\", \\"uri\\": \\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters\\"}, \\"artifact_type\\": {\\"name\\": \\"HyperParameters\\"}, \\"__artifact_class_module__\\": \\"tfx.types.standard_artifacts\\", \\"__artifact_class_name__\\": \\"HyperParameters\\"}]}", "--exec-properties", "{\\"custom_config\\": \\"{\\\\\\"ai_platform_tuning_args\\\\\\": {\\\\\\"masterConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test%5C%5C%5C%5C%5C%5C)"}, \\\\\\"masterType\\\\\\": \\\\\\"n1-standard-4\\\\\\", \\\\\\"project\\\\\\": \\\\\\"gcp-ml-172005\\\\\\", \\\\\\"region\\\\\\": \\\\\\"us-central1\\\\\\", \\\\\\"scaleTier\\\\\\": \\\\\\"CUSTOM\\\\\\", \\\\\\"serviceAccount\\\\\\": \\\\\\"[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)\\\\\\", \\\\\\"workerConfig\\\\\\": {\\\\\\"acceleratorConfig\\\\\\": {\\\\\\"count\\\\\\": 1, \\\\\\"type\\\\\\": \\\\\\"NVIDIA_TESLA_K80\\\\\\"}, \\\\\\"imageUri\\\\\\": \\\\\\"[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test\\\\\\](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test%5C%5C%5C%5C%5C%5C)"}, \\\\\\"workerCount\\\\\\": 3, \\\\\\"workerType\\\\\\": \\\\\\"n1-standard-4\\\\\\"}, \\\\\\"remote_trials_working_dir\\\\\\": \\\\\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials\\\\\\"}\\", \\"eval_args\\": \\"{\\\\n \\\\\\"num_steps\\\\\\": 4\\\\n}\\", \\"train_args\\": \\"{\\\\n \\\\\\"num_steps\\\\\\": 160\\\\n}\\", \\"tune_args\\": \\"{\\\\n \\\\\\"num_parallel_trials\\\\\\": 3\\\\n}\\", \\"tuner_fn\\": \\"models.model.cloud_tuner_fn\\"}"]\n },\n "worker_config": {\n "accelerator_config": {\n "count": "1",\n "type": "NVIDIA_TESLA_K80"\n },\n "image_uri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"\n },\n "service_account": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)"\n}'}
I0813 04:24:20.709398 139812382340928 executor.py:292] Initializing cluster spec...
I0813 04:24:16.823370 139812382340928 executor.py:43] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.796857 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:16.211828 139812382340928 model_util.py:68] struct2tensor is not available: No module named 'struct2tensor'
I0813 04:24:16.211477 139812382340928 model_util.py:63] tensorflow_decision_forests is not available: No module named 'tensorflow_decision_forests'
I0813 04:24:16.211113 139812382340928 model_util.py:58] tensorflow_text is not available: No module named 'tensorflow_text'
I0813 04:24:16.210595 139812382340928 model_util.py:53] tensorflow_ranking is not available: No module named 'tensorflow_ranking'
I0813 04:24:16.210203 139812382340928 model_util.py:44] imported tensorflow_io
I0813 04:24:15.824912 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
I0813 04:24:15.824463 139812382340928 native_type_compatibility.py:250] Using Any for unsupported type: typing.MutableMapping[str, typing.Any]
)]}, exec_properties: {'custom_config': '{"ai_platform_tuning_args": {"masterConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "masterType": "n1-standard-4", "project": "gcp-ml-172005", "region": "us-central1", "scaleTier": "CUSTOM", "serviceAccount": "[vizier@gcp-ml-172005.iam.gserviceaccount.com](mailto:vizier@gcp-ml-172005.iam.gserviceaccount.com)", "workerConfig": {"acceleratorConfig": {"count": 1, "type": "NVIDIA_TESLA_K80"}, "imageUri": "[gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test](http://gcr.io/gcp-ml-172005/resnet50-tfx-pipeline-tuner-test)"}, "workerCount": 3, "workerType": "n1-standard-4"}, "remote_trials_working_dir": "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/trials"}', 'eval_args': '{\n "num_steps": 4\n}', 'train_args': '{\n "num_steps": 160\n}', 'tune_args': '{\n "num_parallel_trials": 3\n}', 'tuner_fn': 'models.model.cloud_tuner_fn'}
, artifact_type: name: "HyperParameters"
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Tuner_-7035895302661865472/best_hyperparameters"
)]}, outputs: {'best_hyperparameters': [Artifact(artifact: id: 3312416091851715625
base_type: DATASET
}
value: INT
key: "version"
properties {
}
value: STRING
key: "split_names"
properties {
}
value: INT
key: "span"
properties {
, artifact_type: name: "Examples"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
}
}
string_value: "[\"eval\", \"train\"]"
value {
key: "split_names"
properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transformed_examples"
)], 'examples': [Artifact(artifact: id: 6958007971664455536
, artifact_type: name: "TransformGraph"
}
}
}
}
}
string_value: "1.9.1"
value {
key: "__value__"
fields {
struct_value {
value {
key: "tfx_version"
custom_properties {
uri: "gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/resnet50-tfx-pipeline-tuner-test/874401645461/resnet50-tfx-pipeline-tuner-test-20220813040932/Transform_2187476734192910336/transform_graph"
I0813 04:24:15.720419 139812382340928 run_executor.py:141] Executor tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor do: inputs: {'transform_graph': [Artifact(artifact: id: 3040439057790690801
2022-08-13 04:24:15.695750: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.694653: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-13 04:24:15.510332: I tensorflow/stream_executor/cuda/[cuda_gpu_executor.cc:975](http://cuda_gpu_executor.cc:975/)] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022/08/13 04:24:10 No id provided.
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"
"File system has been successfully mounted.
"
"Mounting file system "gcsfuse"...
"
"Opening GCS connection...
"
Job tfx_tuner_20220813041519 is queued.
Job creation request has been successfully validated.
Issue Analytics
- State:
- Created a year ago
- Comments:24
Top Results From Across the Web
CloudTuner with error "Internal error occurred for the current attempt ...
I have included CloudTuner for TFX Pipeline, but I constantly get Internal error occurred for the current attempt. in the logs from AI...
Read more >PIP versioning error in an official hyperparameter tuning ...
This is some sort of Python versioning error but Google search finds no ... that from tensorflow_enterprise_addons.cloudtuner import optimizer_client ...
Read more >Debugging TensorFlow Cloud Workflows
Here are some tips for fixing unexpected issues. Operation disallowed within distribution strategy scope. Error like: Creating a generator ...
Read more >TFX 0.25.0-rc2 is Out - Google Groups
Dear TFX community, We are pleased to announce that TFX 0.25.0-rc2 is released. Please feel free to test this and report any issues...
Read more >AWS REFRESHER QUIZ #12 - YouTube
AWS REFRESHER QUIZ #12. 92 views 2 months ago. Cloud Tuner. Cloud Tuner. 159 subscribers. Subscribe. 1. I like this. I dislike this....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
yep, need to add ENABLE_VERTEX_KEY & VERTEX_REGION_KEY in additional to TUNING_ARGS_KEY and REMOTE_TRIALS_WORKING_DIR_KEY
On Cloud (KubeflowDagRunner + extension.Tuner) you can also just use KerasTuner, e.g., RandomSearch in your tuner_fn. I want to know if your workflow had issue on CloudTuner or other part of the the workflow