New classification datasets support for FLAVA
See original GitHub issueTo support our colleagues’ work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.
- Food 101 @jdsgomes #5119
- Stanford Cars @abhi-glitchhg #5166
- FGVC Aircraft @sallysyw #5178
- DTD. A good starting point is this PR from @pmeier #5115
- Oxford Pets. This one also comes with ROIs and segmentation masks, which would be nice to support. We could do something similar to CelebA with a
target_type
parameter. @pmeier #5116 - Flowers-102. @zhiqwang #5177
- EuroSAT @frgfm #5114
- GSTRB. The homepage is timing out for me, but download links can be found here @sumukhaithal6 #5117
- PCAM @NicolasHug https://github.com/pytorch/vision/pull/5203
- Clevr Counts. See also here for what we exactly need @pmeier #5130
- FER2013 This is a Kaggle dataset, so I’m not sure we’ll be able to support download ~(but maybe)~ @pmeier #5120
- Sun397 @saswatpp #5132
- Country211. Apparently download link is here @puhuk #5138
- Rendered SST2 @jdsgomes #5220
CC-ing @pmeier and @jdsgomes as previously discussed. We’re on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022. I’m also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?
Implementing a new dataset
Implementing a dataset consists of 2 main things:
- The dataset class with a
root
,split
,transform
andtarget_transform
parameter. When available we should also support adownload
parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the MNIST class - A test class which will generate automatic tests, e.g. this one for MNIST.
If there’s some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.
For contritbutors
If you’re interesting in taking one of the datasets above, please comment below with “I’m working on dataset X” so that others don’t pick the same! 😃
cc @pmeier
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:24 (16 by maintainers)
Top GitHub Comments
Looks like we’re all done
Thank you so much everyone who submitted a dataset, your help is much appreciated!
Tons of thanks to @pmeier in particular for all your help with submissions and the reviews!!
Dang, I’m a few seconds late. I’ll try PCAM then.