question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Select subset of classes to sample from

See original GitHub issue

🚀 Feature

When loading a dataset with ImageFolder provide an optional argument to select a subset of classes.

Motivation

I deal with large (1000+) multi-class datasets upon which I train image classifiers. However, I usually don’t want to train for all the classes at the same time.

Pitch

I’d like to change find_classes function https://github.com/pytorch/vision/blob/20a771e5143c6867eee63868c38a5bcc272a35e7/torchvision/datasets/folder.py#L61

to

classes = sorted(entry.name for entry in os.scandir(directory) 
                 if entry.is_dir() and (entry.name in allowed_classes or not allowed_classes))

where allowed_classes: Optional[str] = [] is an empty list by default but it can given to ImageFolder at initialisation time (it has to be propagated back to DatasetFolder where find_classes is used).

Alternatives

I tried to

  1. manually create a new folder structure with only the relevant classes. This gets messy quite fast as I have now multiple versions of the same folders
  2. resort to a custom dataloader filtering loaded samples after initialisation (obviously this is very slow as soon as the number of images increases)

Additional context

Here I found a discussion on the same topic.

https://discuss.pytorch.org/t/how-to-sample-images-belonging-to-particular-classes/43776/8

cc @pmeier

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Apr 1, 2021

so that when overriding find_classes I could still re-use the original implementation

I agree with @pmeier, that’s not really an issue right now as find_classes is public. But if the method ever becomes more than that, you could always use super()

1reaction
pmeiercommented, Apr 1, 2021

After a bit more thought, IMO allowed_classes in the find_classes function makes little sense. If you already know which classes you want there is no point in calling this function. On the other hand an exclude parameter could make sense there. @alemelis since you seem to need to a known set of classes, I agree with @NicolasHug the best way to achieve it would be to overwrite ImageFolder.find_classes.

Do you want to send a PR?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to randomly choose a subset of classes every batch and ...
Hi all, I want to get samples from a random subset of classes and sample a batch of examples for this iteration.
Read more >
Using a Subset of data in PyTorch - Blog - Ravi Mashru
Let us create a DataLoader with the subset and verify it fetches only samples of the classes we have specified. # import the...
Read more >
How to get only specific classes from PyTorch's ...
The approach I've followed is below. Iterate through the dataset, one by one, then compare the 1st element (i.e. class) in the returned...
Read more >
Tour of Data Sampling Methods for Imbalanced Classification
Undersampling methods delete or select a subset of examples from the majority class. Some of the more widely used and implemented undersampling ...
Read more >
python - Sample subset characteristics - Cross Validated
You can do clustering and then select the subsets so you are sure that ... and then split each cluster to 80-20 for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found