Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ImageFolder(root) raises error if root contains empty subfolders

See original GitHub issue

🐛 Describe the bug

When using torchvision.datasets.ImageFolder(root) on a root with a subfolder not containing images, an error is thrown.

Example code (works in google colab):

!git clone --depth 1  https://github.com/alexeygrigorev/clothing-dataset-small clothing_dataset_small
!mkdir clothing_dataset_small/empty_subfolder_XYZ
print("############################### Found the following subfolders:")
!ls clothing_dataset_small
print("############################### Trying to create an ImageFolder...")
import torchvision
torchvision.datasets.ImageFolder('clothing_dataset_small')

Output of example code

Cloning into 'clothing_dataset_small'...
remote: Enumerating objects: 3818, done.
remote: Counting objects: 100% (3818/3818), done.
remote: Compressing objects: 100% (3818/3818), done.
remote: Total 3818 (delta 0), reused 3815 (delta 0), pack-reused 0
Receiving objects: 100% (3818/3818), 100.57 MiB | 36.22 MiB/s, done.
############################### Found the following subfolders:
empty_subfolder_XYZ  LICENSE  README.md  test  train  validation
############################### Trying to create an ImageFolder...
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-19-d8d3727bcc2a> in <module>()
      5 print("############################### Trying to create an ImageFolder...")
      6 import torchvision
----> 7 torchvision.datasets.ImageFolder('clothing_dataset_small')

3 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
    100         if extensions is not None:
    101             msg += f"Supported extensions are: {', '.join(extensions)}"
--> 102         raise FileNotFoundError(msg)
    103 
    104     return instances

FileNotFoundError: Found no valid file for the classes .git, empty_subfolder_XYZ. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

Source of error

https://github.com/pytorch/vision/blob/22ff44fd14139f2a056ad52b9bd109bd958089f3/torchvision/datasets/folder.py#L97-L102 As this check was introduced by @pmeier maybe he can describe the intuitions for this.

Versions

Collecting environment information… PyTorch version: 1.10.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.12.0 Libc version: glibc-2.26

Python version: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.10.0+cu111 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.11.0 [pip3] torchvision==0.11.1+cu111 [conda] Could not collect

cc @pmeier

Issue Analytics

State:
Created 2 years ago
Comments:24 (8 by maintainers)

Top GitHub Comments

3reactions

MalteEbnercommented, Nov 16, 2021

You have convinced me that I should think of an ImageFolder differently: It should not be any folder containing some subdirectories with images, but rather a curated folder with exactly one subdirectory for each class and at least one valid file for each class. Thanks for making this more clear to me.

3reactions

pmeiercommented, Nov 12, 2021

Hey @MalteEbner. IIRC, the motivation behind this is to avoid subtle errors. Otherwise you might get a “label gap”, because a directory is recognized as category, but has no samples. Take this setup:

dataset
├── a
│   └── a.png
├── b
└── c
    └── c.png

Removing the check and running

from torchvision.datasets.folder import ImageFolder
import pathlib

dataset = ImageFolder("dataset", loader=lambda path: pathlib.Path(path).name)

for path, category in dataset:
    print(path, category)

now prints