Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to train deit student model in my dataset?

See original GitHub issue

thanks for your code, it really good! but I confused about distillation token when use code to train my data, can you help me?

according to github tutorial, I train deit with command 1: python main.py --model deit_base_distilled_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save this means train deit with no finetune then I tried command 2: python main.py --model deit_base_patch16_384 --batch-size 32 --finetune https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --input-size 384 --lr 5e-6 --weight-decay 1e-8 --epochs 30 --min-lr 5e-6

although I have got a good model, I still want to reproduce student model in my data. command 1&2 are all teacher model in my data, just like first row in the follow picture(maybe?)

but how can I get 84.0% acc or even 84.5% acc?

ps: I tried to train deit with command 2 as teacher model, then train student model with command 3: python main.py --model deit_base_patch16_384 --batch-size 32 --distillation-type hard --teacher-model deit_base_distilled_patch16_224 --teacher-path /path/to/save/checkpoint.pth --distillation-type hard but sadly I get low acc.

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

jorie-pengcommented, Apr 29, 2021

Hi @jorie-peng , Thanks for your question, As explained in the README you should use the command: python run_with_submitit.py --model deit_base_distilled_patch16_224 --distillation-type hard --teacher-model regnety_160 --teacher-path https://dl.fbaipublicfiles.com/deit/regnety_160-a5fe301d.pth --use_volta32 to train a model with distillation. Issue 70 may also be useful. Best, Hugo

Hi, @TouvronHugo , Thanks for your answer! this issue #70 introduced how to train for imagenet, it is a little different from my question.
As I understand it, Deit paper introduced a method to train: train teacher model in dataset first, then train student model. So, for my dataset, should I train a teacher model on my dataset first and then train student model on my dataset? why github suggests us to train deit with finetune model without distilled loss?

0reactions

lxy5513commented, May 31, 2021

@TouvronHugo Thanks.

Top Results From Across the Web

DeiT

The DeiT model was proposed in Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, ...

DeiT Data-Efficient Image Transformer | AIGuys

DeiT introduced a novel distillation technique to make ViT perform well and generalize well, without being pre-trained on huge datasets. DeiT is eco-friendly, ......

Paper Walkthrough: DeiT (Data-efficient image Transformer)

The results hold when DeiT is pre-trained on ImageNet and transferred to other datasets, indicating the model is suitable for transfer learning.

DeiT Explained

A Data-Efficient Image Transformer is a type of Vision Transformer for image classification tasks. The model is trained using a teacher-student strategy ...

DeiT — MMClassification 0.25.0 documentation

More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from ...