question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error finetuning from pretrained checkpoint

See original GitHub issue

Hi all, I’m running into an error when trying to fine-tune from one of the pretrained checkpoints.

Code

!mkdir "$output"
!wget -q -O "$output/checkpoint.pth" https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth

!python -m torch.distributed.launch \
  --nproc_per_node=1 ./dino/main_dino.py \
  --arch deit_small \
  --data_path "$input" \
  --output_dir "$output"

Error

| distributed init (rank 0): env://
git:
  sha: 8aa93fdc90eae4b183c4e3c005174a9f634ecfbf, status: clean, branch: main

arch: deit_small
batch_size_per_gpu: 64
...
...
Student and Teacher are built: they are both deit_small network.
Loss, optimizer and schedulers ready.
Found checkpoint at ./drive/MyDrive/DINO/checkpoint.pth
=> failed to load student from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
=> failed to load teacher from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
=> failed to load optimizer from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
=> failed to load fp16_scaler from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
=> failed to load dino_loss from checkpoint './drive/MyDrive/DINO/checkpoint.pth'

Any suggestions would be very much appreciated.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
yadamonkcommented, May 13, 2021

Hi @ymathildecaron31

Thank you so much for your wonderful work and all the time you’re putting into helping others build on it.

3reactions
yadamonkcommented, May 8, 2021

It looks like the checkpoints were trained on a slightly different version of the released code. Luckily it’s not difficult to change the names of the affected keys.

!wget -q -O "checkpoint.pth" https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain_full_checkpoint.pth
import gc
import torch
checkpoint = torch.load("checkpoint.pth", map_location="cpu")
student = {}

for key, value in checkpoint['student'].items():

  if "projection_head" in key:
    student['module.' + key.replace("projection_head", "mlp")] = value

  elif "prototypes" in key:
    student['module.' + key.replace("prototypes", "last_layer")] = value
    
  else:
    student['module.' + key] = value
teacher = {}

for key, value in checkpoint['teacher'].items():

  if "projection_head" in key:
    teacher[key.replace("projection_head", "mlp")] = value

  elif "prototypes" in key:
    teacher[key.replace("prototypes", "last_layer")] = value

  else:
    teacher[key] = value
torch.save({
            'student': student,
            'teacher': teacher,
            'epoch': checkpoint['epoch'],
            'optimizer': checkpoint['optimizer']
            }, "checkpoint.pth")
del checkpoint, student, teacher
gc.collect();

Now training starts at a much smaller loss and I see the following message.

Found checkpoint at ./checkpoint.pth
=> loaded student from checkpoint './checkpoint.pth' with msg <All keys matched successfully>
=> loaded teacher from checkpoint './checkpoint.pth' with msg <All keys matched successfully>
Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to read from a tensorflow checkpoint for finetuning
1 Answer 1 · A layer in the tf compute graph is renamed. i.e the name of the layer in the pre-trained checkpoint...
Read more >
Error while training a custom pretrained model - Beginners
Hi,. I trained a model as follows: checkpoint = “bert-base-uncased” tokenizer = AutoTokenizer.from_pretrained(checkpoint)
Read more >
fine-tuning can distort pretrained features - OpenReview
error but worse ID error than fine-tuning (Section 3.3). ... Feature quality: We use a checkpoint of MoCo-v1 that got 10% worse accuracy...
Read more >
Fine-tuning a BERT model | Text - TensorFlow
The following directory contains the BERT model's configuration, vocabulary, and a pre-trained checkpoint used in this tutorial:.
Read more >
huggingface load finetuned model - You.com | The AI Search ...
Error while loading the checkpoints ... provides a Trainer class to help you fine-tune any of the pretrained models it provides on your...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found