question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature Request for torchvision ImageFolder using/inheriting DatasetFolder

See original GitHub issue

I came across a feature other users also demanded as may be seen in pytorch forums. For a detailed problem Description and how to solve it see here in a discussion from users, @ptrblck and me: https://discuss.pytorch.org/t/how-to-sample-images-belonging-to-particular-classes/43776/9

In short: Using ImageFolder, which inherits from DatasetFolder, is limiting the user to retrieve a whole dataset from a folder, instead of just using some classes/dirs of the folder structure. Even though one can implement a custom find_classes() method or rather call it a function if one passes it an overwritten DatasetFolder custom implementation, this is often hidden to the user, since one only uses ImageFolder which uses DatasetFolder under the hood.

For users getting this wrong also see the pytorch discussion from the link above in the forum, where @ptrblck and I figured out that it would be nice to be able to just pass such a function that only selects a subset of a folder structure directly by passing an optional function to the ImageFolder.

The line I am talking about in current torchvision DatasetFolder implementation, where subsets from a folder may be retrieved, by overwriting this function: https://github.com/pytorch/vision/blob/fba4f42e3bc24b7b2c6cad09b6db653ac73dc6b7/torchvision/datasets/folder.py#L144

My Suggestion for this improvement that users can use only a subset of a folder structure in ImageFolder looks as follows as also stated in the pytorch forum:

def find_classes(directory: str, desired_class_names: List) -> Tuple[List[str], Dict[str, int]]:
    """Finds the class folders in a dataset."""
    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
    classes = classes [desired_class_names] # TODO: do something like this line! Not tested it yet!
    if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")

    class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
    return classes, class_to_idx

Current implementation suggest overwriting the function as follows within DatasetFolder, but most Users tend to be using ImageFolder as I inferred from posts.

https://github.com/pytorch/vision/blob/fba4f42e3bc24b7b2c6cad09b6db653ac73dc6b7/torchvision/datasets/folder.py#L191-L218

Also as stated @ptrblck suggested to make it possible to pass a function to ImageFolder directly instead of overwriting DatasetFolder. Regarding this i have no code to suggest but it might be trivial by just passing parameters.

cc @pmeier

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Oct 20, 2021

I just saw, that about 3,5k users viewed the post on PyTorch forums I referred to and a lot of colleagues of myself were not aware of this option.

I think that’s mostly because these discussions predate the 0.10 release which is only a few months old, where we made the overriding of find_classes() publicly available.

Since I have little experience with contributing to API documents, could someone suggest the proceeding, if this would be a solution?

The current docs for the ImageDataset are here: https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.ImageFolder

where we say

This class inherits from DatasetFolder so the same methods can be overridden to customize the dataset.

If you think there’s a more obvious way to expose this, you’re welcome to submit a PR 😃

The actual file you’ll need to edit is https://github.com/pytorch/vision/blob/main/torchvision/datasets/folder.py#L271:L271

and our contrributing guide is here 😃 https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md

1reaction
pmeiercommented, Oct 20, 2021

Reminder to self: add functionality to exclude folders to torchvision.prototype.datasets.from_image_folder.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to sample images belonging to particular classes - vision
Feature Request for torchvision ImageFolder using/inheriting DatasetFolder · Issue #4633 · pytorch/vision · GitHub.
Read more >
ImageFolder data_format='zip' feature postponed #42 - GitHub
We won't enable this feature data_format='zip' until torchvision.datasets.DatasetFolder support customize make_dataset.
Read more >
PyTorch: Testing with torchvision.datasets.ImageFolder and ...
Looking at the data from Kaggle and your code, it seems that there are problems in your data loading, both train and test...
Read more >
PyTorch ImageFolder for Training CNN Models - DebuggerCafe
Learn how to use PyTorch ImageFolder class for easier training of CNN models. Train a CNN on a very interesting Butterfly images dataset....
Read more >
04. PyTorch Custom Datasets
Now we're ready to build our own custom Dataset . We'll build one to replicate the functionality of torchvision.datasets.ImageFolder() . This will be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found