Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How can I continue finetuning from checkpoint using the NER script?

See original GitHub issue

❓ Questions & Help

Details

I’m trying to execute this script using run_ner.py but everything I tried to continue fine tuning from checkpoint failed. Any ideas?

I run it using Google Colab. Hereafter the cell content I run:

%cd "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27"
%pip install .
%pip install --upgrade .
%pip install seqeval
from fastai import * 
from transformers import *
%cd "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner"

!python "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/run_ner.py" --data_dir ./ \
                                                                                                          --model_type bert \
                                                                                                          --labels ./labels.txt \
                                                                                                          --model_name_or_path "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000" \
                                                                                                          --output_dir "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/check" \
                                                                                                          --max_seq_length "256" \
                                                                                                          --num_train_epochs "5" \
                                                                                                          --per_gpu_train_batch_size "4" \
                                                                                                          --save_steps "10000" \
                                                                                                          --seed "1" \
                                                                                                          --do_train --do_eval --do_predict

As you can see, I already tried to substitute model_name_or_path parameter value (that was “bert-base-cased”) with checkpoint directory but several errors occurred, asking for the right model name and missing files.

04/28/2020 15:16:36 - INFO - transformers.tokenization_utils -   Model name '/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). Assuming '/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000' is a path, a model identifier, or url to a directory containing tokenizer files.
04/28/2020 15:16:36 - INFO - transformers.tokenization_utils -   Didn't find file /content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000/vocab.txt. We won't load it.
04/28/2020 15:16:36 - INFO - transformers.tokenization_utils -   Didn't find file /content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000/added_tokens.json. We won't load it.
04/28/2020 15:16:36 - INFO - transformers.tokenization_utils -   Didn't find file /content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000/special_tokens_map.json. We won't load it.
04/28/2020 15:16:36 - INFO - transformers.tokenization_utils -   Didn't find file /content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000/tokenizer_config.json. We won't load it.
Traceback (most recent call last):
File "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/run_ner.py", line 290, in <module>
main()
File "/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/run_ner.py", line 149, in main
use_fast=model_args.use_fast,
File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_auto.py", line 197, in from_pretrained
return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 868, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 971, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name '/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed '/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Thank you in advance.

A link to original question on Stack Overflow: https://stackoverflow.com/questions/61482518/how-can-i-continue-finetuning-from-checkpoint-using-the-ner-script

Issue Analytics

State:
Created 3 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Apr 29, 2020

Ah, I think I know where the issue stems from. We’ve changed the paradigm for our examples, which now rely on a Trainer abstraction. Until c811526 five days ago, the tokenizer was unfortunately not saved, which is fixed now.

I would recommend you use the lastest script so that it doesn’t happen anymore.

To fix this error, you could manually save your tokenizer in that folder as so:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
tokenizer.save_pretrained("/content/drive/My Drive/Colab Notebooks/NER/Batteria/transformers-master_2020_04_27/examples/ner/bert-base-256/checkpoint-10000")

This will save the tokenizer file in the appropriate folder. Please keep in mind that this is to reload the original bert-base-cased tokenizer. If you have modified your tokenizer in any way, you should save that tokenizer in the aforementioned folder. Please let me know if I can be of further help!

0reactions

stale[bot]commented, Jul 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.