[RFC] Batteries Included - Phase 2

🚀 The feature

Note: To track the progress of the project check out this board.

This is the 2nd phase of TorchVision’s modernization project (see phase 1). We aim to keep TorchVision relevant by ensuring it provides off-the-shelf all the necessary primitives, model architectures and recipe utilities to produce SOTA results for the supported Computer Vision tasks.

1. New Primitives

To enable our users to reproduce the latest state-of-the-art research we will enhance TorchVision with the following data augmentations, layers, losses and other operators:

Data Augmentations

Augmix - #5411
Large Scale Jitter - #5435 #5446 #5559
Fixed Size Crop - #5607
Random Shortest Size - #5610
Simple CopyPaste - #5825

Layers

Losses

Operators added in PyTorch Core

Better EMA support in AveragedModel - https://github.com/pytorch/pytorch/pull/71763
Add support of empty output in SyncBatchNorm - https://github.com/pytorch/pytorch/pull/74944

2. New Architectures & Model Iterations

To ensure that our users have access to the most popular SOTA models, we will add the following architectures along with pre-trained weights. Moreover we will improve existing architectures with commonly adopted optimizations introduced in follow up research:

Image Classification

Object Detection & Segmentation

FCOS #4961
Post-paper optimizations for RetinaNet, FasterRCNN & MaskRCNN #5444

Video Classification

MViT - #6198

3. Improved Training Recipes & Pre-trained models

To ensure that are users can have access to strong baselines and SOTA weights, we will improve our training recipes to incorporate the newly released primitives and offer improved pre-trained models:

Reference Scripts

Update EMA to use PyTorch Core’s new implementation - #5469
Add support of new Detection primitives in Reference Scripts - #5715

Pre-trained weights

Improve the accuracy of Classification models - #5560 #5906 #5935 #6019
Close the gap with SOTA for Object Detection & Segmentation models - #5756 #5763 #5773
Add weakly-supervised weights for ViT and RegNets - #5714 #5722 #5732 #5721 #5793

Other Candidates

There are several other Operators (#5414), Losses (#2980), Augmentations (#3817) and Models (#2707) proposed by the community. Here are some potential candidates that we could implement depending on bandwidth. Contributions are welcome for any of the below: