Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_from_checkpoint: checkpoint[ 'module_arguments'] KeyError

See original GitHub issue

After training, I load my best checkpoint and run trainer.test. This fails with the following error in v 0.76. Have people encountered this before? My unit tests, which don’t call finetune.py through the command line, do not encounter this issue.

Thanks in advance! Happy to make a reproducible example if this is a new/unknown bug.

    model = model.load_from_checkpoint(checkpoints[-1])
  File "/home/shleifer/miniconda3/envs/nb/lib/python3.7/site-packages/pytorch_lightni
ng/core/lightning.py", line 1563, in load_from_checkpoint
    checkpoint[CHECKPOINT_KEY_MODULE_ARGS].update(kwargs)

KeyError: 'module_arguments'

model is a pl.Module checkpoints[-1] was saved by it, with the save_weights_only=True kwarg specified.

Issue Analytics

State:
Created 3 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

2reactions

wolterlwcommented, Sep 22, 2020

still having issues when loading a checkpoint When I manually examine the checkpoint saved by lightning it only contains following keys:

[‘epoch’, ‘global_step’, ‘pytorch-lightning_version’, ‘checkpoint_callback_best_model_score’, ‘checkpoint_callback_best_model_path’, ‘optimizer_states’, ‘lr_schedulers’, ‘state_dict’]

so when I try using Module.load_from_checkpoint it fails because the parameters are not present. OmegaConf is used to instantiate the module like this: lm = Module(**config.lightning_module_conf)

pytorch_lightning version 0.9.0

1reaction

wolterlwcommented, Oct 30, 2020

@FluidSense did you call self.save_hyperparameters() in LightningModule’s __init__?

Top Results From Across the Web

Unable to load model from checkpoint in Pytorch-Lightning

Cause. This happens because your model is unable to load hyperparameters(n_channels, n_classes=5) from the checkpoint as you do not save ...

Hparams not restored when using load_from_checkpoint ...

Problem I'm having an issue where the model is training fine, and the saved checkpoint does indeed have the hparams used in training....

[solved] KeyError: 'unexpected key "module.encoder ...

I am getting the following error while trying to load a saved model. KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict' This is the ...

Saving and Loading Models — PyTorch Tutorials 1.0.0 ...

Contents: What is a state_dict? Saving & Loading Model for Inference; Saving & Loading a General Checkpoint; Saving Multiple Models in One File;...

Model Checkpointing — DeepSpeed 0.8.0 documentation

DeepSpeed provides routines for checkpointing model state during training. Loading Training Checkpoints¶. deepspeed.DeepSpeedEngine.load_checkpoint(self ...