Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I need some help to reproduce DeiT-III finetuning result

See original GitHub issue

Thank you for sharing finetune code & training logs On IN-1k pretraining, I got similar results to your log: ViT-S 81.43 and ViT-B 82.88 But, I failed to reproduce finetune performance even with your official finetuning setting So, I would like to ask for advice or help.

Here is my fine-tune result with ViT-B on IN-1k.

I expected performance will increase as your fine-tune log, but. instead, the finetune degrades the performance. I can’t use submitit, so I used the following command on 1 node 8 GPUs A100 machine

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${num_gpus_per_node} --nnodes=${WORLD_SIZE} --node_rank=${RANK}  --master_addr=${MASTER_ADDR}  --master_port=${MASTER_PORT} --use_env main.py \
    --model deit_base_patch16_LS \
    --data-path ${local_data_path} \
    --finetune ${SAVE_BASE_PATH}/pretraining/checkpoint-${epoch}.pth \
    --output_dir ${SAVE_BASE_PATH}/finetune4 \
    --batch-size 64 \
    --print_freq 400 \
    --epochs 20 \
    --smoothing 0.1 \
    --reprob 0.0 \
    --opt adamw \
    --lr 1e-5 \
    --weight-decay 0.1 \
    --input-size 224 \
    --drop 0.0 \
    --drop-path 0.2 \
    --mixup 0.8 \
    --cutmix 1.0 \
    --unscale-lr \
    --no-repeated-aug \
    --aa rand-m9-mstd0.5-inc1 \

and full args printed on the command line

Namespace(ThreeAugment=False, aa='rand-m9-mstd0.5-inc1', attn_only=False, auto_resume=True, batch_size=64, bce_loss=False, clip_grad=None, color_jitter=0.3, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/mnt/ddn/datasets/ILSVRC2015/train/Data/CLS-LOC', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=True, drop=0.0, drop_path=0.2, epochs=20, eval=False, finetune='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/pretraining/checkpoint-800.pth', gpu=0, inat_category='name', input_size=224, log_dir='nsmlv2', log_name='finetune', lr=1e-05, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_LS', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/finetune4', patience_epochs=10, pin_mem=True, print_freq=400, rank=0, recount=1, remode='pixel', repeated_aug=False, reprob=0.0, resplit=False, resume='', save_periods=['last2'], sched='cosine', seed=0, smoothing=0.1, src=False, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_interpolation='bicubic', unscale_lr=True, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.1, world_size=8)

I think it is the same as your finetune setting. I double-checked my code but I still don’t know why the result is totally different.

I’m using different library versions torch : 1.11.0a0+b6df043, torchvision: 0.11.0a0, timm: 0.5.4 It might cause some problems, but there was no problem in pretraining and the performance difference is too severe for a simple library version issue.

I’m sorry to keep bothering you, but could you please let me know if there is something wrong with my setting? Or could you please share the ViT-B weights pretrained on IN-1k 192x192 resolution without finetuning on 224x224? If you share the weights before finetune, I can verify my finetune code without doubting my pretraining.

Issue Analytics

State:
Created a year ago
Comments:23 (19 by maintainers)

Top GitHub Comments

4reactions

bhheocommented, Jul 29, 2022

I got the result, and it is almost the same as the official log. set_training_mode=True solves the fine-tune problem.

Thank you for your advice @TouvronHugo

Best Heo

3reactions

bhheocommented, Jul 28, 2022

@TouvronHugo Oh, it looks critical I will test it ASAP

Best Heo

Top Results From Across the Web

DeiT III: Revenge of the ViT - arXiv

A Vision Transformer (ViT) is a simple neural architecture amenable to serve ... As a result, we obtain a competitive performance in image...

deit/main.py at main · facebookresearch/deit - GitHub

print('Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. ' 'This will slightly alter validation results as extra ...

lo-fi: distributed fine-tuning without communication | DeepAI

With lo-fi (local fine-tuning), there is no communication between nodes throughout fine-tuning. As a result, each node k independently ...

lo-fi: distributed fine-tuning without communication

When fine-tuning large neural networks, it is common to use multiple nodes and to com- municate gradients at each optimization step.

Advanced Techniques for Fine-tuning Transformers

Or does your Transformer model suffer from performance and ... However, to achieve better fine-tuning results, sometimes we need to discard ...