Loading pretrained dally, mismatch between checkpoint and initialized model
See original GitHub issueHello!
So I have a partly trained dalle model that I wanted to test but I can’t seem to load it.
My vae params were:
{'image_size': 256,
'num_layers': 3,
'num_tokens': 2048,
'codebook_dim': 512,
'hidden_dim': 256,
'num_resnet_blocks': 2}
My dalle params were:
'dim': 512,
'vae': vae,
'num_text_tokens': 49408,
'text_seq_len': 256,
'depth': 16,
'heads': 16,
'dim_head': 64,
'attn_dropout': 0.1,
'ff_dropout': 0.1,
'reversible': True,
'attn_types': ('full', 'axial_row', 'axial_col', 'conv_like')}
When I initialize both models and try to load my state_dict I get an error in the (I believe axial attention):
RuntimeError: Error(s) in loading state_dict for DALLE:
size mismatch for image_pos_emb.weights_0: copying a param with shape torch.Size([1, 32, 1, 512]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 512]).
size mismatch for image_pos_emb.weights_1: copying a param with shape torch.Size([1, 1, 32, 512]) from checkpoint, the shape in current model is torch.Size([1, 1, 256, 512]).
checking manually confirms that the initialized dalle model has a mismatch with the pretrained one.
image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 256, 512]) for the initialized model image_pos_emb.weights_0 has a dimension of torch.Size([1, 1, 32, 512]) for the pretrained model
Wonder if I’m doing something wrong here? It’s quite possible I’ve totally missed something!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Size Mismatch Runtime Error When Trying to Load a PyTorch ...
It seems to me that your model configuration does not match the content of the model checkpoint. I imagine your model has parameters...
Read more >On generalization capability of randomly initialized vs pre ...
TLDR; Q: “Can training neural networks from randomly initialized weights lead to better generalization than from pre-trained weights?” A: They can.
Read more >Models - Hugging Face
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest...
Read more >Transfer Learning in Keras with Computer Vision Models
A pre-trained model can be used directly to classify new photographs as one of the 1,000 known classes in the image classification task...
Read more >Transfer learning for ECG classification | Scientific Reports
First, we pretrain CNNs on the largest public data set of ... Once the pretraining is finished, we revert the model to the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@TheodoreGalanos Oh no, but that model you trained was prior to a bug fix to the positional encoding 🙏 You may have to just retrain it 😦
Nevermind, turns out the mismatch was in my local computer. All fixed after updated the dalle-pytorch package!