Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models
See original GitHub issueI am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.
Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
Token:
Traceback (most recent call last):
File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
tf.compat.v1.app.run()
File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
spm_model_file=FLAGS.spm_model_file)
File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
self.vocab = load_vocab(vocab_file)
File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
token = token.strip().split()[0]
IndexError: list index out of range
Albert Finetune Shell Script
#!/bin/bash
pip install -r albert/requirements.txt
python -m albert.run_classifier_with_tfhub \
--albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
--task_name=cola \
--do_train=true \
--do_eval=true \
--data_dir=./data-to-albert \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-05 \
--num_train_epochs=3.0 \
--output_dir=./checkpoints/test
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:9
Top Results From Across the Web
List index out of range while saving a trained model
I'm trying to fine-tune a pre-trained DistilBERT model from Huggingface using Tensorflow. Everything runs smoothly and the model builds and ...
Read more >Getting IndexError: list index out of range when fine-tuning
Hi everyone! I want to fine-tune my pre-trained Longformer model and am getting this error:-
Read more >Albert in Keras tf2 using huggingface - Explained - Kaggle
So, we just have to fine tune the model to suit our purpose. What it means - the layer has been trained for...
Read more >Problem with ALBERT pretrained model on TF Hub
It looks like you answered your own question, but to make this more obvious for others: You need to use tensorflow >= 1.14.0....
Read more >transformers · PyPI
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow. Transformers provides thousands of pretrained models to perform tasks on different ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
‘30k-clean.model’ is ‘spm_model_file’, not ‘vocab_file’. The code changes are:
Upgrade your tensorflow to at least 1.15.