question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading a model from PL 1.2 that was saved in PL 1.1 breaks

See original GitHub issue

🐛 Bug

I saved a model trained with PL 1.1 from an environment with PL 1.2 and it breaks. There are some PL specific objects that get pickled into the checkpoint. This shouldn’t happen. See error below:

Traceback (most recent call last):
  File "scripts/train_bart_seq2seq_augmented_kilt.py", line 45, in <module>
    model = BartSeq2SeqAugmented(**vars(args))
  File "/home/ndecao/modify-transformers-memory/src/models/bart_seq2seq_augmented_kilt.py", line 67, in __init__
    self.model = BartSeq2Seq.load_from_checkpoint(self.hparams.model_checkpoint)
  File "/home/ndecao/.anaconda3/envs/kilt37/lib/python3.7/site-packages/pytorch_lightning/core/saving.py", line 134, in load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
  File "/home/ndecao/.anaconda3/envs/kilt37/lib/python3.7/site-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
    return torch.load(f, map_location=map_location)
  File "/home/ndecao/.anaconda3/envs/kilt37/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/ndecao/.anaconda3/envs/kilt37/lib/python3.7/site-packages/torch/serialization.py", line 853, in _load
    result = unpickler.load()
AttributeError: Can't get attribute '_gpus_arg_default' on <module 'pytorch_lightning.utilities.argparse_utils'

Expected behavior

The model should load without any error.

Environment

The model was trained and saved using PL 1.1.6 and loaded from PL 1.2.1

  • PyTorch Version (e.g., 1.0): 1.7.1
  • OS (e.g., Linux): Linux
  • Python version: 3.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
awaelchlicommented, Apr 8, 2021

It’s because the args contain a function, and you pass it into the model which saves all args into the checkpoint (not your fault). Unpickling will not be possible outside PL environment (or when PL code changes). I believe I have a fix for this. #6898 @Borda do you have a suggestion, where is a good place to add a test for this?

Reproduces with:

from argparse import ArgumentParser
from pl_examples.bug_report_model import BoringModel
from pytorch_lightning import Trainer


class Model(BoringModel):

    def __init__(self, gpus=None):
        super().__init__()
        print(gpus)
        self.save_hyperparameters()
        print(self.hparams)


if __name__ == "__main__":
    parser = ArgumentParser()
    parser = Trainer.add_argparse_args(parser)
    args = parser.parse_args(["--gpus", "2"])
    print(args.gpus)
    model = Model(args.gpus)

1reaction
nicola-decaocommented, Mar 16, 2021

@Borda I’ll try to reproduce using the BoringModel

Read more comments on GitHub >

github_iconTop Results From Across the Web

Diagnosing and Resolving Problems - Oracle Help Center
When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data. The data...
Read more >
Enterprise PL/I for z/OS Language Reference - IBM
This edition applies to Enterprise PL/I for z/OS, Version 5 Release 1 ... dynamic save area (register 13 on z/OS) and will make...
Read more >
Bug listing with status RESOLVED with resolution FIXED as at ...
Bug:2 - "How do I attach an ebuild." status:RESOLVED resolution:FIXED severity:normal · Bug:3 - "poedit-1.1.5.ebuild" status:RESOLVED resolution:FIXED ...
Read more >
PL/SQL Developer - Allround Automations
PL /SQL Developer is an Integrated Development Environment that is specifically targeted at the development of stored program units for Oracle Databases.
Read more >
Changelog — PyTorch Lightning 1.8.5 documentation
Integrated the Lite Precision plugins into the PL Precision plugins - the base ... Removed duplicated file extension when uploading model checkpoints with ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found