question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`model_name_or_path` does not seem to load in previously trained checkpoints

See original GitHub issue

Environment info

  • transformers version: 4.9.0.dev0
  • Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): 1.9.0+cu102 (False)
  • Tensorflow version (GPU?): 2.5.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.3.4 (cpu)
  • Jax version: 0.2.16
  • JaxLib version: 0.1.68
  • Using GPU in script?: Using TPU
  • Using distributed or parallel set-up in script?: Yes

Information

Model I am using is RoBERTa, and it is a part of the flax-community week.

I am trying to load a previously trained model checkpoint by setting the ‘model_name_or_path’ flag into a MLM script, which can be found here, but seems that the model is initialized with new weights…

Expected behavior

Seeing that the model training loss would continue from where it stopped, and not seeing that the new model metrics simply mimicked the already trained metrics.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
MalteHBcommented, Jul 7, 2021

@NielsRogge @patrickvonplaten yes of course, sorry!

As mentioned we are using a modified version of the run_mlm_flax_stream.py script which you can find here, and the code used to run the script is, where "/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/" is a directory with a config.jsonand a flax_model.msgpack:

export MODEL_DIR=/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/

source /home/Z6HJB/test/bin/activate

python3 ./src/run_mlm_flax_stream.py \
    --model_name_or_path="/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/" \
    --output_dir="/home/Z6HJB/roberta-large-scandi/model_continued2" \
    --tokenizer_name="${MODEL_DIR}" \
    --dataset_name="mc4" \
    --dataset_config_name="unshuffled_deduplicated_en" \
    --max_seq_length="128" \
    --per_device_train_batch_size="128" \
    --per_device_eval_batch_size="128" \
    --learning_rate="3e-4" \
    --warmup_steps="1000" \
    --overwrite_output_dir \
    --adam_beta1="0.9" \
    --adam_beta2="0.98" \
    --num_train_steps="1000000" \
    --num_eval_samples="5000" \
    --save_steps="1000" \
    --logging_steps="25" \
    --eval_steps="1000" \
    --push_to_hub \
    #--config_name="${MODEL_DIR}" \
    #--model_type="roberta" \

Let me know if this suffices or if you need more!

I might be busy for the rest of the day since I have a football match to watch 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰

2reactions
NielsRoggecommented, Jul 7, 2021

Can you post a code snippet or make a Colab to reproduce the error?

Read more comments on GitHub >

github_iconTop Results From Across the Web

can't load checkpoint file from examples ... - GitHub
This gives an error because "model_name_or_path" is assumed to be a JSON file that contained pretrained model info, not a saved checkpoint ......
Read more >
OSError: Unable to load weights from pytorch checkpoint file
Hi, everyone. I need some help. I have been developing the Flask website that has embedded one of Transformer's fine-tuned models within it....
Read more >
Load a pre-trained model from disk with Huggingface ...
Where is the file located relative to your model folder? I believe it has to be a relative PATH rather than an absolute...
Read more >
Migrating model checkpoints | TensorFlow Core
This guide assumes that you have a model that saves and loads checkpoints with tf.compat.v1 ... You are migrating your training code and...
Read more >
How to Save and Load Models in PyTorch - Wandb
Usually, this is done to resume training from the last or best checkpoint. It is also a safeguard in case the training gets...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found