question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I need some help to reproduce DeiT-III finetuning result

See original GitHub issue

Hi

Thank you for sharing finetune code & training logs On IN-1k pretraining, I got similar results to your log: ViT-S 81.43 and ViT-B 82.88 But, I failed to reproduce finetune performance even with your official finetuning setting So, I would like to ask for advice or help.

Here is my fine-tune result with ViT-B on IN-1k. image

I expected performance will increase as your fine-tune log, but. instead, the finetune degrades the performance. I can’t use submitit, so I used the following command on 1 node 8 GPUs A100 machine

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${num_gpus_per_node} --nnodes=${WORLD_SIZE} --node_rank=${RANK}  --master_addr=${MASTER_ADDR}  --master_port=${MASTER_PORT} --use_env main.py \
    --model deit_base_patch16_LS \
    --data-path ${local_data_path} \
    --finetune ${SAVE_BASE_PATH}/pretraining/checkpoint-${epoch}.pth \
    --output_dir ${SAVE_BASE_PATH}/finetune4 \
    --batch-size 64 \
    --print_freq 400 \
    --epochs 20 \
    --smoothing 0.1 \
    --reprob 0.0 \
    --opt adamw \
    --lr 1e-5 \
    --weight-decay 0.1 \
    --input-size 224 \
    --drop 0.0 \
    --drop-path 0.2 \
    --mixup 0.8 \
    --cutmix 1.0 \
    --unscale-lr \
    --no-repeated-aug \
    --aa rand-m9-mstd0.5-inc1 \

and full args printed on the command line

Namespace(ThreeAugment=False, aa='rand-m9-mstd0.5-inc1', attn_only=False, auto_resume=True, batch_size=64, bce_loss=False, clip_grad=None, color_jitter=0.3, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/mnt/ddn/datasets/ILSVRC2015/train/Data/CLS-LOC', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=True, drop=0.0, drop_path=0.2, epochs=20, eval=False, finetune='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/pretraining/checkpoint-800.pth', gpu=0, inat_category='name', input_size=224, log_dir='nsmlv2', log_name='finetune', lr=1e-05, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_LS', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/finetune4', patience_epochs=10, pin_mem=True, print_freq=400, rank=0, recount=1, remode='pixel', repeated_aug=False, reprob=0.0, resplit=False, resume='', save_periods=['last2'], sched='cosine', seed=0, smoothing=0.1, src=False, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_interpolation='bicubic', unscale_lr=True, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.1, world_size=8)

I think it is the same as your finetune setting. I double-checked my code but I still don’t know why the result is totally different.

I’m using different library versions torch : 1.11.0a0+b6df043, torchvision: 0.11.0a0, timm: 0.5.4 It might cause some problems, but there was no problem in pretraining and the performance difference is too severe for a simple library version issue.

I’m sorry to keep bothering you, but could you please let me know if there is something wrong with my setting? Or could you please share the ViT-B weights pretrained on IN-1k 192x192 resolution without finetuning on 224x224? If you share the weights before finetune, I can verify my finetune code without doubting my pretraining.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:23 (19 by maintainers)

github_iconTop GitHub Comments

4reactions
bhheocommented, Jul 29, 2022

Hi

I got the result, and it is almost the same as the official log. set_training_mode=True solves the fine-tune problem. image

Thank you for your advice @TouvronHugo

Best Heo

3reactions
bhheocommented, Jul 28, 2022

@TouvronHugo Oh, it looks critical I will test it ASAP

Best Heo

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeiT III: Revenge of the ViT - arXiv
A Vision Transformer (ViT) is a simple neural architecture amenable to serve ... As a result, we obtain a competitive performance in image...
Read more >
deit/main.py at main · facebookresearch/deit - GitHub
print('Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. ' 'This will slightly alter validation results as extra ...
Read more >
lo-fi: distributed fine-tuning without communication | DeepAI
With lo-fi (local fine-tuning), there is no communication between nodes throughout fine-tuning. As a result, each node k independently ...
Read more >
lo-fi: distributed fine-tuning without communication
When fine-tuning large neural networks, it is common to use multiple nodes and to com- municate gradients at each optimization step.
Read more >
Advanced Techniques for Fine-tuning Transformers
Or does your Transformer model suffer from performance and ... However, to achieve better fine-tuning results, sometimes we need to discard ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found