question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Resume training by specifying the model you'd like to resume from using `--restore-file <path to checkpoint>`.

See original GitHub issue

Yes, you can resume training by specifying the model you’d like to resume from using --restore-file <path to checkpoint>.

_Originally posted by @lematt1991 in https://github.com/pytorch/fairseq/issues/1182#issuecomment-535507612_

First Model was trained on architecture LSTM and the second one was also LSTM with restore-file option. Both were being trained on separate data files (same language pair) Error: Architecture mismatch.

image

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
aastha19commented, Dec 5, 2019

@echan00 The bin files were created separately and also a common dictionary was created. And this common dictionary was placed in the bin files for both the data sets replacing their actual dictionary created at the time of pre-processing. Somehow it worked for me!

2reactions
huihuifancommented, Sep 27, 2019

It’s not going to be possible to restore from a checkpoint where the vocabulary size is different… the input/output embedding matrices are going to be the wrong size. This is not a bug with the code. You need to decide how you want to handle this, most likely you want to re-process your second dataset with the same dictionary as the first.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resume training by specifying the model you'd like to ... - GitHub
Yes, you can resume training by specifying the model you'd like to resume from using --restore-file <path to checkpoint> .
Read more >
Checkpointing — PyTorch Lightning 1.6.2 documentation
Checkpointing your training allows you to resume a training process in case it was interrupted, fine-tune a model or use a pre-trained model...
Read more >
Resuming Training and Checkpoints in Python TensorFlow ...
In this video, I show how to halt training and continue with Keras.
Read more >
Saving and Loading Your Model to Resume Training in PyTorch
So in this post, we will be talking about how to save your model in the form of checkpoints and how to load...
Read more >
Keras: Starting, stopping, and resuming training
Learning how to start, stop, and resume training a deep learning model is a super important skill to master — at some point...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found