Resume training by specifying the model you'd like to resume from using `--restore-file <path to checkpoint>`.
See original GitHub issueYes, you can resume training by specifying the model you’d like to resume from using --restore-file <path to checkpoint>
.
_Originally posted by @lematt1991 in https://github.com/pytorch/fairseq/issues/1182#issuecomment-535507612_
First Model was trained on architecture LSTM and the second one was also LSTM with restore-file option. Both were being trained on separate data files (same language pair) Error: Architecture mismatch.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Resume training by specifying the model you'd like to ... - GitHub
Yes, you can resume training by specifying the model you'd like to resume from using --restore-file <path to checkpoint> .
Read more >Checkpointing — PyTorch Lightning 1.6.2 documentation
Checkpointing your training allows you to resume a training process in case it was interrupted, fine-tune a model or use a pre-trained model...
Read more >Resuming Training and Checkpoints in Python TensorFlow ...
In this video, I show how to halt training and continue with Keras.
Read more >Saving and Loading Your Model to Resume Training in PyTorch
So in this post, we will be talking about how to save your model in the form of checkpoints and how to load...
Read more >Keras: Starting, stopping, and resuming training
Learning how to start, stop, and resume training a deep learning model is a super important skill to master — at some point...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@echan00 The bin files were created separately and also a common dictionary was created. And this common dictionary was placed in the bin files for both the data sets replacing their actual dictionary created at the time of pre-processing. Somehow it worked for me!
It’s not going to be possible to restore from a checkpoint where the vocabulary size is different… the input/output embedding matrices are going to be the wrong size. This is not a bug with the code. You need to decide how you want to handle this, most likely you want to re-process your second dataset with the same dictionary as the first.