Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: Couldn't instantiate the backend tokenizer while loading model tokenizer

See original GitHub issue

Environment info

transformers version: 4.2.2
Platform: Colab
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@mfuntowicz @patrickvonplaten

Information

Model I am using (Bert, XLNet …): T5 The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below) https://github.com/allenai/unifiedqa Loading the model mentioned here for tokenizer does not work The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Follow the instructions here https://github.com/allenai/unifiedqa to get the sample code
Copy paste it in Colab to run it.

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def run_model(input_string, **generator_args):
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, **generator_args)
    return tokenizer.batch_decode(res, skip_special_tokens=True)

Expected behavior

The following code should load the model without errors.

Error

But the following error is obtained:

ValueError                                Traceback (most recent call last)
<ipython-input-4-ee10e1c1c77e> in <module>()
      2 
      3 model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
----> 4 tokenizer = AutoTokenizer.from_pretrained(model_name)
      5 model = T5ForConditionalGeneration.from_pretrained(model_name)
      6 

4 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
     94         else:
     95             raise ValueError(
---> 96                 "Couldn't instantiate the backend tokenizer from one of: "
     97                 "(1) a `tokenizers` library serialization file, "
     98                 "(2) a slow tokenizer instance to convert or "

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

67reactions

rsanjaykamathcommented, Jan 25, 2021

Ok got it. Installing sentencepiece and restarting the kernel did the trick for me.

Thanks for your help 😃 Closing the issue.

19reactions

patrickvonplatencommented, Jan 25, 2021

I see it’s the classic sentencepiece error - I should have better read your error message 😉

Here the colab to show how it works: https://colab.research.google.com/drive/1QybYdj-1bW0MHD0cutWBPWas5IFEhSjC?usp=sharing

Top Results From Across the Web

python | ValueError: Couldn't instantiate the backend tokenizer

First, I had to pip install sentencepiece . However, in the same code line, I was getting an error with sentencepiece .

Error with new tokenizers (URGENT!) - Hugging Face Forums

Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3)...

tokenization_utils.py - CodaLab Worksheets

Fast tokenizers are provided by HuggingFace's tokenizers library. ... pretrained model when loading the tokenizer with the ``from_pretrained()`` method.

Transformers Course - Chapter 3 - TF & Torch - Kaggle

In Chapter 2 we explored how to use tokenizers and pretrained models to make ... you will need a huggingface.co account: create an...

I can't run this code in Colab, need help. - Reddit

ValueError : Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer ...