question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ImageFolder(root) raises error if root contains empty subfolders

See original GitHub issue

🐛 Describe the bug

When using torchvision.datasets.ImageFolder(root) on a root with a subfolder not containing images, an error is thrown.

Example code (works in google colab):

!git clone --depth 1  https://github.com/alexeygrigorev/clothing-dataset-small clothing_dataset_small
!mkdir clothing_dataset_small/empty_subfolder_XYZ
print("############################### Found the following subfolders:")
!ls clothing_dataset_small
print("############################### Trying to create an ImageFolder...")
import torchvision
torchvision.datasets.ImageFolder('clothing_dataset_small')

Output of example code

Cloning into 'clothing_dataset_small'...
remote: Enumerating objects: 3818, done.
remote: Counting objects: 100% (3818/3818), done.
remote: Compressing objects: 100% (3818/3818), done.
remote: Total 3818 (delta 0), reused 3815 (delta 0), pack-reused 0
Receiving objects: 100% (3818/3818), 100.57 MiB | 36.22 MiB/s, done.
############################### Found the following subfolders:
empty_subfolder_XYZ  LICENSE  README.md  test  train  validation
############################### Trying to create an ImageFolder...
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-19-d8d3727bcc2a> in <module>()
      5 print("############################### Trying to create an ImageFolder...")
      6 import torchvision
----> 7 torchvision.datasets.ImageFolder('clothing_dataset_small')

3 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
    100         if extensions is not None:
    101             msg += f"Supported extensions are: {', '.join(extensions)}"
--> 102         raise FileNotFoundError(msg)
    103 
    104     return instances

FileNotFoundError: Found no valid file for the classes .git, empty_subfolder_XYZ. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

Source of error

https://github.com/pytorch/vision/blob/22ff44fd14139f2a056ad52b9bd109bd958089f3/torchvision/datasets/folder.py#L97-L102 As this check was introduced by @pmeier maybe he can describe the intuitions for this.

Versions

Collecting environment information… PyTorch version: 1.10.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.12.0 Libc version: glibc-2.26

Python version: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.10.0+cu111 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.11.0 [pip3] torchvision==0.11.1+cu111 [conda] Could not collect

cc @pmeier

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:24 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
MalteEbnercommented, Nov 16, 2021

You have convinced me that I should think of an ImageFolder differently: It should not be any folder containing some subdirectories with images, but rather a curated folder with exactly one subdirectory for each class and at least one valid file for each class. Thanks for making this more clear to me.

3reactions
pmeiercommented, Nov 12, 2021

Hey @MalteEbner. IIRC, the motivation behind this is to avoid subtle errors. Otherwise you might get a “label gap”, because a directory is recognized as category, but has no samples. Take this setup:

dataset
├── a
│   └── a.png
├── b
└── c
    └── c.png

Removing the check and running

from torchvision.datasets.folder import ImageFolder
import pathlib

dataset = ImageFolder("dataset", loader=lambda path: pathlib.Path(path).name)

for path, category in dataset:
    print(path, category)

now prints

a.png 0
c.png 2

Although you have only two categories, the model handling the data now needs to handle three categories.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pytorch torchvision.datasets.ImageFolder FileNotFoundError ...
The case happen to me is I found a hidden file called .ipynb_checkpoints which is located parallelly to image class subfolders. I think...
Read more >
04. PyTorch Custom Datasets
Loading image data with a custom Dataset, What if PyTorch didn't have an ... Train data: Dataset ImageFolder Number of datapoints: 225 Root...
Read more >
Questions about ImageFolder - PyTorch Forums
The labels are the sub-folders from the main directory. Say you have mnist images separated by digit like this: main_dir/ 0/ img1_digit0.jpg ...
Read more >
Delete a directory or file using Python - GeeksforGeeks
os.rmdir() method in Python is used to remove or delete an empty directory. OSError will be raised if the specified path is not...
Read more >
fiftyone.core.dataset - Voxel51
make_unique_dataset_name (root). Makes a unique dataset name with the given root name. ... 0 (-) – raise error if a top-level field cannot...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found