question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model Weights Saved in 0.5.3 may be incompatible with 0.6.0

See original GitHub issue

Describe the bug It seems that models trained and saved in monai 0.5.3 cannot be restored and used in 0.6.0. There is mis-match between the keys in different layers.

To Reproduce Clone the UNETR repository from (https://github.com/Project-MONAI/research-contributions/tree/master/UNETR/BTCV). Train a model in 0.5.3 and save the weights. Load the model in 0.6.0 using: ckpt = torch.load(args.ckpt) model.load_state_dict(ckpt[‘state_dict’])

Expected behavior Monai 0.6.0 version throws this error for mis-matched keys in loading the state_dict for the convolution layers of the network. For example, the following is a representative error:

RuntimeError: Error(s) in loading state_dict for UNETR:
        Unexpected key(s) in state_dict: "encoder1.layer.norm1.weight", "encoder1.layer.norm1.bias", "encoder1.layer.norm2.weight", "encoder1.layer.norm2.bias", "encoder1.layer.norm3.weight", "encoder1.layer.norm3.bias", "encoder10.layer.norm1.weight", "encoder10.layer.norm1.bias", "encoder10.layer.norm2.weight", "encoder10.layer.norm2.bias", "encoder10.layer.norm3.weight", "encoder10.layer.norm3.bias", "decoder5.conv_block.norm1.weight", "decoder5.conv_block.norm1.bias", "decoder5.conv_block.norm2.weight", "decoder5.conv_block.norm2.bias", "decoder5.conv_block.norm3.weight", "decoder5.conv_block.norm3.bias", "decoder4.conv_block.norm1.weight", "decoder4.conv_block.norm1.bias", "decoder4.conv_block.norm2.weight", "decoder4.conv_block.norm2.bias", "decoder4.conv_block.norm3.weight", "decoder4.conv_block.norm3.bias", "decoder3.conv_block.norm1.weight", "decoder3.conv_block.norm1.bias", "decoder3.conv_block.norm2.weight", "decoder3.conv_block.norm2.bias", "decoder3.conv_block.norm3.weight", "decoder3.conv_block.norm3.bias", "decoder2.conv_block.norm1.weight", "decoder2.conv_block.norm1.bias", "decoder2.conv_block.norm2.weight", "decoder2.conv_block.norm2.bias", "decoder2.conv_block.norm3.weight", "decoder2.conv_block.norm3.bias", "decoder1.conv_block.norm1.weight", "decoder1.conv_block.norm1.bias", "decoder1.conv_block.norm2.weight", "decoder1.conv_block.norm2.bias", "decoder1.conv_block.norm3.weight", "decoder1.conv_block.norm3.bias".

These are for the convolutional layers of the network In 0.5.3 version, you can simply check out for example model.decoder1.conv_block.norm3.bias and get the values ( even without loading the state_dict)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       requires_grad=True)

But in 0.6.0 , model.decoder1.conv_block.norm3.bias returns None.

If this is not fixed, it breaks the compatibility between 0.5.3 and 0.6.0 for restoring models.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
wylicommented, Aug 24, 2021

I think this ticket makes a good point, @yiheng-wang-nv perhaps we can update the migration guide to make it clear? as another workaround, is it possible to use copy_model_state to load v0.5.3 checkpoints? https://github.com/Project-MONAI/MONAI/blob/e95500ad1802e91da12f31350ba8e3b85dc36d11/monai/networks/utils.py#L339

In general, monai adopts the semantic versioning – at this stage we are at major version zero (0.y.z), indicating an initial development. Anything may change at any time, the public API should not be considered stable.

0reactions
yiheng-wang-nvcommented, Aug 26, 2021

MONAI/monai/networks/utils.py

Thanks @wyli , I’ve updated the guide.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading saved model fails with ValueError You are trying to ...
Loading saved model fails with ValueError You are trying to load a weight file containing 1 layers into a model with 0 layers...
Read more >
Usage of LSTM/GRU and Flatten throws dimensional ...
Here the error under the upper circumstances. InvalidArgumentError: Incompatible shapes: [144,1] vs. [144,18,1] ...
Read more >
imc2022-dependencies | Kaggle
This example notebook contains pip packages, binaries, or pre-trained models to be used in offline notebooks. # After you run this, do File...
Read more >
Keras load pre-trained weights. Shape mismatch
I have some trouble loading pre-trained weights with Keras. Let's say I have a keras model model and that my weights are stored...
Read more >
Interactive and resumable training — SLEAP (v1.2.9)
In this notebook we will explore how to set up a training job and train a model for multiple rounds without the GUI...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found