Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

See original GitHub issue

I am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.

Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
Token:  

Traceback (most recent call last):
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
    tf.compat.v1.app.run()
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
    tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
    spm_model_file=FLAGS.spm_model_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
    self.vocab = load_vocab(vocab_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
    token = token.strip().split()[0]
IndexError: list index out of range

Albert Finetune Shell Script

#!/bin/bash
pip install -r albert/requirements.txt
python -m albert.run_classifier_with_tfhub \
--albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
--task_name=cola \
--do_train=true \
--do_eval=true  \
--data_dir=./data-to-albert \
--max_seq_length=128  \
--train_batch_size=32  \
--learning_rate=2e-05 \
--num_train_epochs=3.0  \
--output_dir=./checkpoints/test

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:9

Top GitHub Comments

5reactions

KodairaTomonoricommented, Oct 28, 2019

‘30k-clean.model’ is ‘spm_model_file’, not ‘vocab_file’. The code changes are:

diff --git a/albert/run_classifier_with_tfhub.py b/albert/run_classifier_with_tfhub.py
index 92fef74..26f4339 100644                                                         
--- a/albert/run_classifier_with_tfhub.py                                             
+++ b/albert/run_classifier_with_tfhub.py                                             
@@ -156,6 +156,7 @@ def create_tokenizer_from_hub_module(albert_hub_module_handle):   
     with tf.Session() as sess:                                                       
       vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],         
                                             tokenization_info["do_lower_case"]])     
+    FLAGS.spm_model_file = vocab_file                                                
   return tokenization.FullTokenizer(                                                 
       vocab_file=vocab_file, do_lower_case=do_lower_case,                            
       spm_model_file=FLAGS.spm_model_file)

1reaction

mnsrmovcommented, Oct 31, 2019

Upgrade your tensorflow to at least 1.15.

Top Results From Across the Web

List index out of range while saving a trained model

I'm trying to fine-tune a pre-trained DistilBERT model from Huggingface using Tensorflow. Everything runs smoothly and the model builds and ...

Getting IndexError: list index out of range when fine-tuning

Hi everyone! I want to fine-tune my pre-trained Longformer model and am getting this error:-

Albert in Keras tf2 using huggingface - Explained - Kaggle

So, we just have to fine tune the model to suit our purpose. What it means - the layer has been trained for...

Problem with ALBERT pretrained model on TF Hub

It looks like you answered your own question, but to make this more obvious for others: You need to use tensorflow >= 1.14.0....

transformers · PyPI

State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow. Transformers provides thousands of pretrained models to perform tasks on different ...