question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

See original GitHub issue

I am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.

Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
Token:  

Traceback (most recent call last):
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
    tf.compat.v1.app.run()
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
    tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
    spm_model_file=FLAGS.spm_model_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
    self.vocab = load_vocab(vocab_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
    token = token.strip().split()[0]
IndexError: list index out of range

Albert Finetune Shell Script

#!/bin/bash
pip install -r albert/requirements.txt
python -m albert.run_classifier_with_tfhub \
--albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
--task_name=cola \
--do_train=true \
--do_eval=true  \
--data_dir=./data-to-albert \
--max_seq_length=128  \
--train_batch_size=32  \
--learning_rate=2e-05 \
--num_train_epochs=3.0  \
--output_dir=./checkpoints/test

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:9

github_iconTop GitHub Comments

5reactions
KodairaTomonoricommented, Oct 28, 2019

‘30k-clean.model’ is ‘spm_model_file’, not ‘vocab_file’. The code changes are:

diff --git a/albert/run_classifier_with_tfhub.py b/albert/run_classifier_with_tfhub.py
index 92fef74..26f4339 100644                                                         
--- a/albert/run_classifier_with_tfhub.py                                             
+++ b/albert/run_classifier_with_tfhub.py                                             
@@ -156,6 +156,7 @@ def create_tokenizer_from_hub_module(albert_hub_module_handle):   
     with tf.Session() as sess:                                                       
       vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],         
                                             tokenization_info["do_lower_case"]])     
+    FLAGS.spm_model_file = vocab_file                                                
   return tokenization.FullTokenizer(                                                 
       vocab_file=vocab_file, do_lower_case=do_lower_case,                            
       spm_model_file=FLAGS.spm_model_file)                                           
1reaction
mnsrmovcommented, Oct 31, 2019

Upgrade your tensorflow to at least 1.15.

Read more comments on GitHub >

github_iconTop Results From Across the Web

List index out of range while saving a trained model
I'm trying to fine-tune a pre-trained DistilBERT model from Huggingface using Tensorflow. Everything runs smoothly and the model builds and ...
Read more >
Getting IndexError: list index out of range when fine-tuning
Hi everyone! I want to fine-tune my pre-trained Longformer model and am getting this error:-
Read more >
Albert in Keras tf2 using huggingface - Explained - Kaggle
So, we just have to fine tune the model to suit our purpose. What it means - the layer has been trained for...
Read more >
Problem with ALBERT pretrained model on TF Hub
It looks like you answered your own question, but to make this more obvious for others: You need to use tensorflow >= 1.14.0....
Read more >
transformers · PyPI
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow. Transformers provides thousands of pretrained models to perform tasks on different ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found