Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve the accuracy of Classification models by using SOTA recipes and primitives

See original GitHub issue

🚀 Feature

Update the weights of all pre-trained models to improve their accuracy.

Motivation

New Recipe + FixRes mitigations

torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 176 --model-ema --val-resize-size 232

Using a recipe which includes Warmup, Cosine Annealing, Label Smoothing, Mixup, Cutmix, Random Erasing, TrivialAugment, No BN weight decay, EMA and long training cycles and optional FixRes mitigations we are able to improve the resnet50 accuracy by over 4.5 points. For more information on the training recipe, check here:

Old ResNet50:
Acc@1 76.130 Acc@5 92.862

New ResNet50:
Acc@1 80.674 Acc@5 95.166

Running other models through the same recipe, achieves the following improved accuracies:

ResNet101:
Acc@1 81.728 Acc@5 95.670

ResNet152:
Acc@1 82.042 Acc@5 95.926

ResNeXt50_32x4d:
Acc@1 81.116 Acc@5 95.478

ResNeXt101_32x8d:
Acc@1 82.834 Acc@5 96.228

MobileNetV3 Large:
Acc@1 74.938 Acc@5 92.496

Wide ResNet50 2:
Acc@1 81.602 Acc@5 95.758 (@prabhat00155)

Wide ResNet101 2:
Acc@1 82.492 Acc@5 96.110 (@prabhat00155)

regnet_x_400mf:
Acc@1 74.864 Acc@5 92.322 (@kazhang)

regnet_x_800mf:
Acc@1 77.522 Acc@5 93.826 (@kazhang)

regnet_x_1_6gf:
Acc@1 79.668 Acc@5 94.922 (@kazhang)

New Recipe (without FixRes mitigations)

torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--model-ema --val-resize-size 232

Removing the optional FixRes mitigations seems to yield better results for some deeper architectures and variants with larger receptive fields:

ResNet101:
Acc@1 81.886 Acc@5 95.780

ResNet152:
Acc@1 82.284 Acc@5 96.002

ResNeXt50_32x4d:
Acc@1 81.198 Acc@5 95.340

ResNeXt101_32x8d:
Acc@1 82.812 Acc@5 96.226

MobileNetV3 Large:
Acc@1 75.152 Acc@5 92.634

Wide ResNet50_2:
Acc@1 81.452 Acc@5 95.544 (@prabhat00155)

Wide ResNet101_2:
Acc@1 82.510 Acc@5 96.020 (@prabhat00155)

regnet_x_3_2gf:
Acc@1 81.196 Acc@5 95.430

regnet_x_8gf:
Acc@1 81.682 Acc@5 95.678

regnet_x_16g:
Acc@1 82.716 Acc@5 96.196

regnet_x_32gf:
Acc@1 83.014 Acc@5 96.288

regnet_y_400mf:
Acc@1 75.804 Acc@5 92.742

regnet_y_800mf:
Acc@1 78.828 Acc@5 94.502

regnet_y_1_6gf:
Acc@1 80.876 Acc@5 95.444

regnet_y_3_2gf:
Acc@1 81.982 Acc@5 95.972

regnet_y_8gf:
Acc@1 82.828 Acc@5 96.330

regnet_y_16gf:
Acc@1 82.886 Acc@5 96.328

regnet_y_32gf:
Acc@1 83.368 Acc@5 96.498

New Recipe + Regularization tuning

torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00001 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--model-ema --val-resize-size 232

Adjusting slightly the regularization can help us improve the following:

MobileNetV3 Large:
Acc@1 75.274 Acc@5 92.566

In addition to regularization adjustment we can also apply the Repeated Augmentation trick --ra-sampler --ra-reps 4:

MobileNetV2:
Acc@1 72.154 Acc@5 90.822

Post-Training Quantized models

ResNet50:
Acc@1 80.282 Acc@5 94.976

ResNeXt101_32x8d:
Acc@1 82.574 Acc@5 96.132

New Recipe (LR+weight_decay+train_crop_size tuning)

torchrun --ngpus 8 --nodes 1 --model $MODEL_NAME --batch-size 128 --lr 1 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.000002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 208 --model-ema --val-crop-size 240 --val-resize-size 255

EfficientNet-B1:
Acc@1 79.838 Acc@5 94.934

Pitch

To be able to improve the pre-trained model accuracy, we need to complete the “Batteries Included” work as #3911. Moreover we will need to extend our existing model builders to support multiple weights as described at #4611. Then we will be able to:

Update our reference scripts for classification to support the new primitives added by the “Batteries Included” initiative.
Find a good training recipe for the most important pre-trained models and re-train them. Note that different training configuration might be required for different types of models (for example mobile models are less likely to overfit comparing to bigger models and thus make use of different recipes/primitives)
Update the weights of the models in the library.

cc @datumbox @vfdev-5

Issue Analytics

State:
Created 2 years ago
Reactions:22
Comments:14 (13 by maintainers)

Top GitHub Comments

7reactions

tbennuncommented, Jan 16, 2022

@datumbox As per the discussion in #5084, below is a recipe that achieved the following result on ResNet-50 and ImageNet: Acc@1 80.858 Acc@5 95.434

torchrun --nproc_per_node=8 train.py --model resnet50 --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 176 --model-ema --val-resize-size 232 \
--ra-sampler --ra-reps=4

Overview of changes to the current recipe (New Recipe + FixRes mitigations):

Repeated Augmentation (--ra-sampler --ra-reps=4): In each batch, we sample 1/4 of the original batch size and reuse each sample 4 times with different data augmentations (taken from the same set of augmentations as the original recipe). Repeated Augmentations (RA, also called Batch Augmentation) was successfully used to boost generalization on various models and datasets via gradient variance reduction. In particular, RA with four repetitions was used on ImageNet in prior literature.
- Reference used for four repeated augmentations: E. Hoffer et al. “Augment Your Batch: Improving Generalization Through Instance Repetition”, CVPR 2020.
Since I ran this on a 4-GPU node, I changed the number of processes per node to 4 and batch size to 256. It should be equivalent to the 8x128 batch size in the current recipe.
--cache-dataset (omitted) was also used to speed up initial Python loading time. Should have no effect on the recipe.

3reactions

datumboxcommented, Nov 21, 2021

@xiaohu2015 Of course! I’m in the middle of writing a blogpost that will include the configs, the training methodology, detailed ablations etc. It should be out next week. 😃

Edit: Here is the blogpost that documents the training recipe.

Top Results From Across the Web

How to Train State-Of-The-Art Models Using TorchVision's ...

We will share the exact recipe used to improve our baseline by over 4.7 accuracy points to reach a final top-1 accuracy of...

Vasilis Vryniotis (@bbriniotis) / Twitter

Improve the accuracy of Detection & Segmentation models by using SOTA recipes and primitives ·... The feature Similar to #3995 but focus on...

Workshop on Distribution Shifts - NeurIPS 2022

Our method increases accuracy up to 3.6% over existing methods on five text classification tasks with noisy supervision sources. Additionally, task modeling can ......

Memoirs of a TorchVision developer – 3 - Datumbox

This enabled us to improve the accuracy of our Classification models by 3 accuracy points, achieving new SOTA for various architectures.

token merging: your vit but faster - arXiv

training speed of ViT models, both with and without training (Sec. ... merge, both of which increase accuracy (see Tab. 1a).