Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading pretrained dally, mismatch between checkpoint and initialized model

See original GitHub issue

Hello!

So I have a partly trained dalle model that I wanted to test but I can’t seem to load it.

My vae params were:

{'image_size': 256,
 'num_layers': 3,
 'num_tokens': 2048,
 'codebook_dim': 512,
 'hidden_dim': 256,
 'num_resnet_blocks': 2}

My dalle params were:

'dim': 512,
'vae': vae,
'num_text_tokens': 49408,
'text_seq_len': 256,
'depth': 16,
'heads': 16,
'dim_head': 64,
'attn_dropout': 0.1,
'ff_dropout': 0.1,
'reversible': True,
'attn_types': ('full', 'axial_row', 'axial_col', 'conv_like')}

When I initialize both models and try to load my state_dict I get an error in the (I believe axial attention):

RuntimeError: Error(s) in loading state_dict for DALLE:
	size mismatch for image_pos_emb.weights_0: copying a param with shape torch.Size([1, 32, 1, 512]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 512]).
	size mismatch for image_pos_emb.weights_1: copying a param with shape torch.Size([1, 1, 32, 512]) from checkpoint, the shape in current model is torch.Size([1, 1, 256, 512]).

checking manually confirms that the initialized dalle model has a mismatch with the pretrained one.

image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 256, 512]) for the initialized model image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 32, 512]) for the pretrained model

Wonder if I’m doing something wrong here? It’s quite possible I’ve totally missed something!

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

lucidrainscommented, Mar 16, 2021

@TheodoreGalanos Oh no, but that model you trained was prior to a bug fix to the positional encoding 🙏 You may have to just retrain it 😦

0reactions

TheodoreGalanoscommented, Mar 16, 2021

Nevermind, turns out the mismatch was in my local computer. All fixed after updated the dalle-pytorch package!

Read more comments on GitHub >

Top Results From Across the Web

Size Mismatch Runtime Error When Trying to Load a PyTorch ...

It seems to me that your model configuration does not match the content of the model checkpoint. I imagine your model has parameters...

On generalization capability of randomly initialized vs pre ...

TLDR; Q: “Can training neural networks from randomly initialized weights lead to better generalization than from pre-trained weights?” A: They can.

Models - Hugging Face

The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest...

Transfer Learning in Keras with Computer Vision Models

A pre-trained model can be used directly to classify new photographs as one of the 1,000 known classes in the image classification task...

Transfer learning for ECG classification | Scientific Reports

First, we pretrain CNNs on the largest public data set of ... Once the pretraining is finished, we revert the model to the...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

(Bug) Using 'conv_like' attention causes loss to nosedive too quickly

More "OpenAI Blog Post" Training | Depth 32 | Heads 8 | LR 5e-4