Imagenet: Allow to download validation set only.
See original GitHub issueIs your feature request related to a problem? Please describe.
To use the validation set of Imagenet2012 by calling tfds.load('imagenet2012', split='validation')
, tfds requires that both training and validation set are downloaded.
Note that the training set is quite large and imagenet downloads are quite slow. Thus downloading and storing if not used is undesirable.
Describe the solution you’d like
If only one of the two datasets (either training or validation) is found on the file system, tfds prints a warning.
Only if access to one which is not stored is attempted, e.g. by calling tfds.load('imagenet2012', split='train+validation')
an error should be raised
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Download, pre-process, and upload the ImageNet dataset
Steps to pre-processing the full ImageNet dataset; Verify space requirements; Set up the target directories; Register and request permission to download the ...
Read more >[D] How do I get the ImageNet validation dataset (images and ...
I would like to know where I can download the ImageNet's validation set for image classification (images and labels).
Read more >Download Imagenet Validation Set | Kaggle
Download Validation Set As the ILSVRC dataset is only available using kaggle and you can only download the whole DB using the...
Read more >How to train and validate on Imagenet - Radek Osmulski
Obtaining the data One way of getting Imagenet is through the official site. The problem with this approach is that you cannot register...
Read more >How to access to already downloaded ImageNet dataset?
I downloaded two folders : ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar, and placed them in a drive with the following path: ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hey @MiWeiss, That would, indeed, be a good optimization but currently the way TFDS loads a dataset is by first going through all of the splits, downloading and preparing the necessary tfrecord files and then provide by split, as requested by the user. This is done so as in the future, if the user needs a different split, he could easily do that without TFDS having to download that very split at runtime. If we want to enhance it such that TFDS downloads and prepares every unique split only when it is requested by the user, it will need some major changes in the way
tfds.core.dataset_builder.py
andtfds.core.load.py
works. @Conchylicultor What do you think about this?Looks like a good solution, allowing us to close this now. Thanks for PR & Review @ibarrond @Conchylicultor .
Feel free to re-open issue if I missed something and #2484 did not completely solve this issue.