question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading pretrained dally, mismatch between checkpoint and initialized model

See original GitHub issue

Hello!

So I have a partly trained dalle model that I wanted to test but I can’t seem to load it.

My vae params were:

{'image_size': 256,
 'num_layers': 3,
 'num_tokens': 2048,
 'codebook_dim': 512,
 'hidden_dim': 256,
 'num_resnet_blocks': 2}

My dalle params were:

'dim': 512,
'vae': vae,
'num_text_tokens': 49408,
'text_seq_len': 256,
'depth': 16,
'heads': 16,
'dim_head': 64,
'attn_dropout': 0.1,
'ff_dropout': 0.1,
'reversible': True,
'attn_types': ('full', 'axial_row', 'axial_col', 'conv_like')}

When I initialize both models and try to load my state_dict I get an error in the (I believe axial attention):

RuntimeError: Error(s) in loading state_dict for DALLE:
	size mismatch for image_pos_emb.weights_0: copying a param with shape torch.Size([1, 32, 1, 512]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 512]).
	size mismatch for image_pos_emb.weights_1: copying a param with shape torch.Size([1, 1, 32, 512]) from checkpoint, the shape in current model is torch.Size([1, 1, 256, 512]).

checking manually confirms that the initialized dalle model has a mismatch with the pretrained one.

image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 256, 512]) for the initialized model image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 32, 512]) for the pretrained model

Wonder if I’m doing something wrong here? It’s quite possible I’ve totally missed something!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, Mar 16, 2021

@TheodoreGalanos Oh no, but that model you trained was prior to a bug fix to the positional encoding 🙏 You may have to just retrain it 😦

0reactions
TheodoreGalanoscommented, Mar 16, 2021

Nevermind, turns out the mismatch was in my local computer. All fixed after updated the dalle-pytorch package!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Size Mismatch Runtime Error When Trying to Load a PyTorch ...
It seems to me that your model configuration does not match the content of the model checkpoint. I imagine your model has parameters...
Read more >
On generalization capability of randomly initialized vs pre ...
TLDR; Q: “Can training neural networks from randomly initialized weights lead to better generalization than from pre-trained weights?” A: They can.
Read more >
Models - Hugging Face
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest...
Read more >
Transfer Learning in Keras with Computer Vision Models
A pre-trained model can be used directly to classify new photographs as one of the 1,000 known classes in the image classification task...
Read more >
Transfer learning for ECG classification | Scientific Reports
First, we pretrain CNNs on the largest public data set of ... Once the pretraining is finished, we revert the model to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found