New classification datasets support for FLAVA

See original GitHub issue

To support our colleagues’ work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.

Food 101 @jdsgomes #5119
Stanford Cars @abhi-glitchhg #5166
FGVC Aircraft @sallysyw #5178
DTD. A good starting point is this PR from @pmeier #5115
Oxford Pets. This one also comes with ROIs and segmentation masks, which would be nice to support. We could do something similar to CelebA with a target_type parameter. @pmeier #5116
Flowers-102. @zhiqwang #5177
EuroSAT @frgfm #5114
GSTRB. The homepage is timing out for me, but download links can be found here @sumukhaithal6 #5117
PCAM @NicolasHug https://github.com/pytorch/vision/pull/5203
Clevr Counts. See also here for what we exactly need @pmeier #5130
FER2013 This is a Kaggle dataset, so I’m not sure we’ll be able to support download ~(but maybe)~ @pmeier #5120
Sun397 @saswatpp #5132
Country211. Apparently download link is here @puhuk #5138
Rendered SST2 @jdsgomes #5220

CC-ing @pmeier and @jdsgomes as previously discussed. We’re on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022. I’m also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?

Implementing a new dataset

Implementing a dataset consists of 2 main things:

The dataset class with a root, split, transform and target_transform parameter. When available we should also support a download parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the MNIST class
A test class which will generate automatic tests, e.g. this one for MNIST.

If there’s some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.

For contritbutors

If you’re interesting in taking one of the datasets above, please comment below with “I’m working on dataset X” so that others don’t pick the same! 😃

cc @pmeier

Issue Analytics

State:
Created 2 years ago
Reactions:7
Comments:24 (16 by maintainers)

Top GitHub Comments

16reactions

NicolasHugcommented, Jan 20, 2022

Looks like we’re all done

Thank you so much everyone who submitted a dataset, your help is much appreciated!

Tons of thanks to @pmeier in particular for all your help with submissions and the reviews!!

10reactions

fibbonnacicommented, Dec 17, 2021

I am taking the Food 101 now now.

Dang, I’m a few seconds late. I’ll try PCAM then.

Top Results From Across the Web

arXiv:2112.04482v3 [cs.CV] 29 Mar 2022

In this work, we introduce knowledge and information from these unimodal datasets through 1) pretraining the im- age encoder and text encoder on ......

facebook/flava-full - Hugging Face

FLAVA was pretrained on public available 70M image and text pairs. This includes datasets such as COCO, Visual Genome, Localized Narratives, RedCaps, a...

Facebook AI's FLAVA Foundational Model Tackles Vision ...

A Facebook AI Research team presents FLAVA, a foundational language and vision alignment model that explicitly targets language, vision, ...

FLAVA: A Foundational Language And Vision Alignment ...

We summarize the hyperparameters in our FLAVA model in Table A.1. We also list the sampling probabilities of the datasets for joint pretraining...

DSS 11 Release notes - Dataiku Documentation

New feature: Added ability to export the train/test sets of a Lab model to a dataset · New feature: Time Series: Visual ML...