Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't convert ImageNet to TFDS

See original GitHub issue

What I need help with / What I was wondering I need to run python -m tensorflow_datasets.scripts.download_and_prepare --datasets=imagenet2012 to convert imagenet dataset to “tfds” format.

I have:

~/tensorflow_datasets/downloads/manual/ILSVRC2012_img_train.tar
~/tensorflow_datasets/downloads/manual/ILSVRC2012_img_val.tar

I get a crash:

$ python -m tensorflow_datasets.scripts.download_and_prepare --datasets=imagenet2012
2020-07-09 12:42:44.810165: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1                      
I0709 12:42:46.014064 139951095191360 download_and_prepare.py:201] Running download_and_prepare for dataset(s):                                                        
imagenet2012                                                                                                                                                           
2020-07-09 12:42:46.039449: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".                                                                                                  
I0709 12:42:46.465344 139951095191360 dataset_info.py:427] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: imagenet2012/5.0.0                                         
I0709 12:42:46.734692 139951095191360 dataset_info.py:358] Load dataset info from /tmp/tmpq9l_v2v7tfds                                                                                       
I0709 12:42:46.738695 139951095191360 dataset_info.py:398] Field info.description from disk and from code do not match. Keeping the one from code.                                           
I0709 12:42:46.738808 139951095191360 dataset_info.py:398] Field info.citation from disk and from code do not match. Keeping the one from code.                                              
I0709 12:42:46.739076 139951095191360 download_and_prepare.py:139] download_and_prepare for dataset imagenet2012/5.0.0...                                                                    
I0709 12:42:46.739398 139951095191360 dataset_builder.py:346] Generating dataset imagenet2012 (/home/bryanloz/tensorflow_datasets/imagenet2012/5.0.0)                                        
Downloading and preparing dataset imagenet2012/5.0.0 (download: 144.02 GiB, generated: Unknown size, total: 144.02 GiB) to /home/bryanloz/tensorflow_datasets/imagenet2012/5.0.0...          
I0709 12:42:49.851376 139951095191360 dataset_builder.py:947] Generating split train                                                                                                         
76809 examples [01:11, 1230.11 examples/s]2020-07-09 12:44:01.563442: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1       
2020-07-09 12:44:01.758785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:                                                                         
pciBusID: 0000:3b:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0                                                                                                                     
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s                                                                                               
2020-07-09 12:44:01.760404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:                                                                         
pciBusID: 0000:af:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0                                                                                                                     
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s                                                                                               
2020-07-09 12:44:01.761609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties:                                                                         
pciBusID: 0000:d8:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0                                                                                                                     
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s                                                                                               
2020-07-09 12:44:01.761638: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1                                            
2020-07-09 12:44:01.763099: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10                                              
2020-07-09 12:44:01.764466: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10                                               
2020-07-09 12:44:01.768670: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10                                              
2020-07-09 12:44:01.807622: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10                                            
2020-07-09 12:44:01.809431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10                                            
2020-07-09 12:44:01.815753: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7                                                
2020-07-09 12:44:01.827543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2                                                                     
2020-07-09 12:44:01.828474: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA                                                                                                                          
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                                  
2020-07-09 12:44:01.868361: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2100000000 Hz                                                                          
2020-07-09 12:44:01.878461: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557610c82a00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:                                                                                                                                                                                          
2020-07-09 12:44:01.878515: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version                                                             
2020-07-09 12:44:01.900848: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:1: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.069513: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:2: failed initializing StreamExecutor for CUDA device ordinal 2: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.069596: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.069938: I tensorflow/compiler/jit/xla_gpu_device.cc:161] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA                                                                                                                                                                                      
2020-07-09 12:44:02.074957: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.214344: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:2: failed initializing StreamExecutor for CUDA device ordinal 2: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.214487: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:1: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.214891: I tensorflow/compiler/jit/xla_gpu_device.cc:161] Ignoring visible XLA_GPU_JIT device. Device number is 1, reason: Internal: no supported devices found for platform CUDA                                                                                                                                                                                      
2020-07-09 12:44:02.220597: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.376580: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:1: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.376694: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:2: failed initializing StreamExecutor for CUDA device ordinal 2: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS                                                        
2020-07-09 12:44:02.377093: I tensorflow/compiler/jit/xla_gpu_device.cc:161] Ignoring visible XLA_GPU_JIT device. Device number is 2, reason: Internal: no supported devices found for platform CUDA
2020-07-09 12:44:02.518009: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
Fatal Python error: Aborted

Thread 0x00007f4786554700 (most recent call first):
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/threading.py", line 300 in wait
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/threading.py", line 552 in wait
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tqdm/_monitor.py", line 69 in run
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/threading.py", line 917 in _bootstrap_inner
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/threading.py", line 885 in _bootstrap

Current thread 0x00007f48e7509740 (most recent call first):
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 539 in ensure_initialized
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 97 in convert_to_eager_tensor
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 300 in _constant_eager_impl
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 275 in _constant_impl
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 264 in constant
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 338 in _constant_tensor_conversion_function
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1525 in convert_to_tensor
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/ops/gen_image_ops.py", line 1241 in decode_jpeg_eager_fallback
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow/python/ops/gen_image_ops.py", line 1177 in decode_jpeg
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/utils/tf_utils.py", line 77 in run
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/utils/image_utils.py", line 54 in jpeg_cmyk_to_rgb
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/image_classification/imagenet.py", line 176 in _fix_image
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/image_classification/imagenet.py", line 197 in _generate_examples
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tqdm/std.py", line 1129 in __iter__
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1034 in _prepare_split
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 951 in _download_and_prepare
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1019 in _download_and_prepare
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 376 in download_and_prepare
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 69 in disallow_positional_args_dec
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/scripts/download_and_prepare.py", line 156 in download_and_prepare
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/scripts/download_and_prepare.py", line 236 in main
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/absl/app.py", line 299 in run
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/site-packages/tensorflow_datasets/scripts/download_and_prepare.py", line 241 in <module>
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/runpy.py", line 85 in _run_code
  File "/scratch/bryanloz/anaconda3/envs/tf22/lib/python3.7/runpy.py", line 193 in _run_module_as_main
Aborted (core dumped)

What I’ve tried so far I’ve tried moving the tar balls from network storage to local storage with no improvement

It would be nice if… It might be helpful if documentation for tfds was a bit more verbose, what is tfds even doing to my tar balls?

Environment information (if applicable)

Ubuntu 18.04
Python version: 3.7.0
tensorflow-datasets/tfds-nightly version: (tfds-nightly) 3.1.0
tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: (tf-nightly) 2.4.0-dev20200709

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

Conchylicultorcommented, Jul 10, 2020

TFDS preprocess public data into standard uniform tf-record which can be loaded as efficient tf.data.Dataset pipeline. I would recommend our introduction: https://www.tensorflow.org/datasets/overview

0reactions

wilderfieldcommented, Jul 10, 2020

Closing, because I am happy, but I think this issue, could be helpful to others in the future. (If you see weird CUDA problems, try debugging by running with CPU only).

Top Results From Across the Web

TensorFlow Datasets

TFDS provides a collection of ready-to-use datasets for use with TensorFlow, Jax, and other Machine Learning frameworks.

Preparing the ImageNet dataset with TensorFlow

We have worked through setting up the ImageNet dataset. Unfortunately, we cannot set up the test dataset as conveniently. Further, no labels are ......

Preparing the ImageNet dataset with ... - Pascal Janetzky

Without a doubt, the ImageNet dataset has been a critical factor in developing advanced Machine ... import tensorflow_datasets as tfds.

Tensorflow Datasets Reshape Images - python - Stack Overflow

Because each data has different shapes, I can't build a data pipeline. import tensorflow_datasets as tfds import tensorflow as tf ...

Download, pre-process, and upload the ImageNet dataset

The validation and test data are not contained in the ImageNet training data ... You cannot download the dataset until ImageNet confirms your...