error when trying to use multilingual model for fine tuning
See original GitHub issueI wanted to use fine tuning for hindi language data. For that I tried to give bert-base-mutlilingual model but I am getting the following error
python pregenerate_training_data.py --train_corpus=./hindi_pytorch_bert_data_1.txt --bert_model=bert-base-multilingual --output_dir=./hindi_train_data_1_3epochs/ --epochs_to_generate=3
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Model name 'bert-base-multilingual' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'bert-base-multilingual' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
File "pregenerate_training_data.py", line 292, in <module>
main()
File "pregenerate_training_data.py", line 255, in main
vocab_list = list(tokenizer.vocab.keys())
AttributeError: 'NoneType' object has no attribute 'vocab'
I tried giving bert-base-multilingual-cased as well then I ran into this error
python pregenerate_training_data.py --train_corpus=./hindi_pytorch_bert_data_1.txt --bert_model=bert-base-multilingual-cased --output_dir=./hindi_train_data_1_3epochs/ --epochs_to_generate=3
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
usage: pregenerate_training_data.py [-h] --train_corpus TRAIN_CORPUS
--output_dir OUTPUT_DIR --bert_model
{bert-base-uncased,bert-large-uncased,bert-base-cased,bert-base-multilingual,bert-base-chinese}
[--do_lower_case] [--reduce_memory]
[--epochs_to_generate EPOCHS_TO_GENERATE]
[--max_seq_len MAX_SEQ_LEN]
[--short_seq_prob SHORT_SEQ_PROB]
[--masked_lm_prob MASKED_LM_PROB]
[--max_predictions_per_seq MAX_PREDICTIONS_PER_SEQ]
pregenerate_training_data.py: error: argument --bert_model: invalid choice: 'bert-base-multilingual-cased' (choose from 'bert-base-uncased', 'bert-large-uncased', 'bert-base-cased', 'bert-base-multilingual', 'bert-base-chinese')
How to resolve this issue?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
error when trying to use multilingual model for fine tuning #511
I wanted to use fine tuning for hindi language data. For that I tried to give bert-base-mutlilingual model but I am getting the...
Read more >Fine-Tune Whisper For Multilingual ASR with Transformers
In this blog, we present a step-by-step guide on fine-tuning Whisper for any multilingual ASR dataset using Hugging Face Transformers.
Read more >Multilingual fine-tuning for Grammatical Error Correction
Finding a single model capable of comprehending multiple languages is an area of active research in Natural Language Processing (NLP).
Read more >Fine-Tune Universal Sentence Encoder Large with TF2
Below is my code for fine-tuning the Universal Sentence Encoder Multilingual Large 2. I am not able to resolve the resulting error. I...
Read more >Unsupervised Training for Sentence Transformers - Pinecone
We will learn to train these models using the unsupervised ... sentence transformer and use a fine-tuning process called multilingual knowledge distillation ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, I followed your code, and got this error:
Traceback (most recent call last): | 6796/185072 [00:00<00:18, 9787.42it/s] File “pregenerate_training_data.py”, line 308, in <module> main() File “pregenerate_training_data.py”, line 293, in main vocab_list=vocab_list) File “pregenerate_training_data.py”, line 208, in create_instances_from_document assert len(tokens_b) >= 1 AssertionError
Can you please share your code?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.