Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`model_name_or_path` does not seem to load in previously trained checkpoints

See original GitHub issue

Environment info

transformers version: 4.9.0.dev0
Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.9.0+cu102 (False)
Tensorflow version (GPU?): 2.5.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.3.4 (cpu)
Jax version: 0.2.16
JaxLib version: 0.1.68
Using GPU in script?: Using TPU
Using distributed or parallel set-up in script?: Yes

Information

Model I am using is RoBERTa, and it is a part of the flax-community week.

I am trying to load a previously trained model checkpoint by setting the ‘model_name_or_path’ flag into a MLM script, which can be found here, but seems that the model is initialized with new weights…

Expected behavior

Seeing that the model training loss would continue from where it stopped, and not seeing that the new model metrics simply mimicked the already trained metrics.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

MalteHBcommented, Jul 7, 2021

@NielsRogge @patrickvonplaten yes of course, sorry!

As mentioned we are using a modified version of the run_mlm_flax_stream.py script which you can find here, and the code used to run the script is, where "/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/" is a directory with a config.jsonand a flax_model.msgpack:

export MODEL_DIR=/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/

source /home/Z6HJB/test/bin/activate

python3 ./src/run_mlm_flax_stream.py \
    --model_name_or_path="/home/Z6HJB/roberta-large-scandi/roberta-base-pretrained-scandinavian/" \
    --output_dir="/home/Z6HJB/roberta-large-scandi/model_continued2" \
    --tokenizer_name="${MODEL_DIR}" \
    --dataset_name="mc4" \
    --dataset_config_name="unshuffled_deduplicated_en" \
    --max_seq_length="128" \
    --per_device_train_batch_size="128" \
    --per_device_eval_batch_size="128" \
    --learning_rate="3e-4" \
    --warmup_steps="1000" \
    --overwrite_output_dir \
    --adam_beta1="0.9" \
    --adam_beta2="0.98" \
    --num_train_steps="1000000" \
    --num_eval_samples="5000" \
    --save_steps="1000" \
    --logging_steps="25" \
    --eval_steps="1000" \
    --push_to_hub \
    #--config_name="${MODEL_DIR}" \
    #--model_type="roberta" \

Let me know if this suffices or if you need more!

I might be busy for the rest of the day since I have a football match to watch 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰 🇩🇰

2reactions

NielsRoggecommented, Jul 7, 2021

Can you post a code snippet or make a Colab to reproduce the error?