`model_name_or_path` does not seem to load in previously trained checkpoints
See original GitHub issueEnvironment info
transformers
version: 4.9.0.dev0- Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.9.0+cu102 (False)
- Tensorflow version (GPU?): 2.5.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.4 (cpu)
- Jax version: 0.2.16
- JaxLib version: 0.1.68
- Using GPU in script?: Using TPU
- Using distributed or parallel set-up in script?: Yes
Information
Model I am using is RoBERTa, and it is a part of the flax-community week.
I am trying to load a previously trained model checkpoint by setting the ‘model_name_or_path’ flag into a MLM script, which can be found here, but seems that the model is initialized with new weights…
Expected behavior
Seeing that the model training loss would continue from where it stopped, and not seeing that the new model metrics simply mimicked the already trained metrics.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
can't load checkpoint file from examples ... - GitHub
This gives an error because "model_name_or_path" is assumed to be a JSON file that contained pretrained model info, not a saved checkpoint ......
Read more >OSError: Unable to load weights from pytorch checkpoint file
Hi, everyone. I need some help. I have been developing the Flask website that has embedded one of Transformer's fine-tuned models within it....
Read more >Load a pre-trained model from disk with Huggingface ...
Where is the file located relative to your model folder? I believe it has to be a relative PATH rather than an absolute...
Read more >Migrating model checkpoints | TensorFlow Core
This guide assumes that you have a model that saves and loads checkpoints with tf.compat.v1 ... You are migrating your training code and...
Read more >How to Save and Load Models in PyTorch - Wandb
Usually, this is done to resume training from the last or best checkpoint. It is also a safeguard in case the training gets...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@NielsRogge @patrickvonplaten yes of course, sorry!
As mentioned we are using a modified version of the
run_mlm_flax_stream.py
script which you can find here, and the code used to run the script is, where"/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/"
is a directory with aconfig.json
and aflax_model.msgpack
:Let me know if this suffices or if you need more!
I might be busy for the rest of the day since I have a football match to watch 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰
Can you post a code snippet or make a Colab to reproduce the error?