Select subset of classes to sample from
See original GitHub issue🚀 Feature
When loading a dataset with ImageFolder
provide an optional argument to select a subset of classes.
Motivation
I deal with large (1000+) multi-class datasets upon which I train image classifiers. However, I usually don’t want to train for all the classes at the same time.
Pitch
I’d like to change find_classes
function
https://github.com/pytorch/vision/blob/20a771e5143c6867eee63868c38a5bcc272a35e7/torchvision/datasets/folder.py#L61
to
classes = sorted(entry.name for entry in os.scandir(directory)
if entry.is_dir() and (entry.name in allowed_classes or not allowed_classes))
where allowed_classes: Optional[str] = []
is an empty list by default but it can given to ImageFolder
at initialisation time (it has to be propagated back to DatasetFolder
where find_classes
is used).
Alternatives
I tried to
- manually create a new folder structure with only the relevant classes. This gets messy quite fast as I have now multiple versions of the same folders
- resort to a custom dataloader filtering loaded samples after initialisation (obviously this is very slow as soon as the number of images increases)
Additional context
Here I found a discussion on the same topic.
https://discuss.pytorch.org/t/how-to-sample-images-belonging-to-particular-classes/43776/8
cc @pmeier
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (7 by maintainers)
I agree with @pmeier, that’s not really an issue right now as
find_classes
is public. But if the method ever becomes more than that, you could always usesuper()
After a bit more thought, IMO
allowed_classes
in thefind_classes
function makes little sense. If you already know which classes you want there is no point in calling this function. On the other hand anexclude
parameter could make sense there. @alemelis since you seem to need to a known set of classes, I agree with @NicolasHug the best way to achieve it would be to overwriteImageFolder.find_classes
.Do you want to send a PR?