ImageFolder(root) raises error if root contains empty subfolders
See original GitHub issue🐛 Describe the bug
When using torchvision.datasets.ImageFolder(root)
on a root with a subfolder not containing images, an error is thrown.
Example code (works in google colab):
!git clone --depth 1 https://github.com/alexeygrigorev/clothing-dataset-small clothing_dataset_small
!mkdir clothing_dataset_small/empty_subfolder_XYZ
print("############################### Found the following subfolders:")
!ls clothing_dataset_small
print("############################### Trying to create an ImageFolder...")
import torchvision
torchvision.datasets.ImageFolder('clothing_dataset_small')
Output of example code
Cloning into 'clothing_dataset_small'...
remote: Enumerating objects: 3818, done.
remote: Counting objects: 100% (3818/3818), done.
remote: Compressing objects: 100% (3818/3818), done.
remote: Total 3818 (delta 0), reused 3815 (delta 0), pack-reused 0
Receiving objects: 100% (3818/3818), 100.57 MiB | 36.22 MiB/s, done.
############################### Found the following subfolders:
empty_subfolder_XYZ LICENSE README.md test train validation
############################### Trying to create an ImageFolder...
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-19-d8d3727bcc2a> in <module>()
5 print("############################### Trying to create an ImageFolder...")
6 import torchvision
----> 7 torchvision.datasets.ImageFolder('clothing_dataset_small')
3 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
100 if extensions is not None:
101 msg += f"Supported extensions are: {', '.join(extensions)}"
--> 102 raise FileNotFoundError(msg)
103
104 return instances
FileNotFoundError: Found no valid file for the classes .git, empty_subfolder_XYZ. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp
Source of error
https://github.com/pytorch/vision/blob/22ff44fd14139f2a056ad52b9bd109bd958089f3/torchvision/datasets/folder.py#L97-L102 As this check was introduced by @pmeier maybe he can describe the intuitions for this.
Versions
Collecting environment information… PyTorch version: 1.10.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.12.0 Libc version: glibc-2.26
Python version: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.10.0+cu111 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.11.0 [pip3] torchvision==0.11.1+cu111 [conda] Could not collect
cc @pmeier
Issue Analytics
- State:
- Created 2 years ago
- Comments:24 (8 by maintainers)
Top GitHub Comments
You have convinced me that I should think of an
ImageFolder
differently: It should not be any folder containing some subdirectories with images, but rather a curated folder with exactly one subdirectory for each class and at least one valid file for each class. Thanks for making this more clear to me.Hey @MalteEbner. IIRC, the motivation behind this is to avoid subtle errors. Otherwise you might get a “label gap”, because a directory is recognized as category, but has no samples. Take this setup:
Removing the check and running
now prints
Although you have only two categories, the model handling the data now needs to handle three categories.