how to train deit student model in my dataset?
See original GitHub issuethanks for your code, it really good! but I confused about distillation token when use code to train my data, can you help me?
according to github tutorial, I train deit with command 1:
python main.py --model deit_base_distilled_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save
this means train deit with no finetune
then I tried command 2:
python main.py --model deit_base_patch16_384 --batch-size 32 --finetune https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --input-size 384 --lr 5e-6 --weight-decay 1e-8 --epochs 30 --min-lr 5e-6
although I have got a good model, I still want to reproduce student model in my data. command 1&2 are all teacher model in my data, just like first row in the follow picture(maybe?)
but how can I get 84.0% acc or even 84.5% acc?
ps: I tried to train deit with command 2 as teacher model, then train student model with command 3:
python main.py --model deit_base_patch16_384 --batch-size 32 --distillation-type hard --teacher-model deit_base_distilled_patch16_224 --teacher-path /path/to/save/checkpoint.pth --distillation-type hard
but sadly I get low acc.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Hi, @TouvronHugo , Thanks for your answer! this issue #70 introduced how to train for imagenet, it is a little different from my question.
As I understand it, Deit paper introduced a method to train: train teacher model in dataset first, then train student model. So, for my dataset, should I train a teacher model on my dataset first and then train student model on my dataset? why github suggests us to train deit with finetune model without distilled loss?
@TouvronHugo Thanks.