ValueError: Couldn't instantiate the backend tokenizer while loading model tokenizer
See original GitHub issueEnvironment info
transformers
version: 4.2.2- Platform: Colab
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help
Information
Model I am using (Bert, XLNet …): T5 The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below) https://github.com/allenai/unifiedqa Loading the model mentioned here for tokenizer does not work The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- Follow the instructions here https://github.com/allenai/unifiedqa to get the sample code
- Copy paste it in Colab to run it.
from transformers import AutoTokenizer, T5ForConditionalGeneration
model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
def run_model(input_string, **generator_args):
input_ids = tokenizer.encode(input_string, return_tensors="pt")
res = model.generate(input_ids, **generator_args)
return tokenizer.batch_decode(res, skip_special_tokens=True)
Expected behavior
The following code should load the model without errors.
Error
But the following error is obtained:
ValueError Traceback (most recent call last)
<ipython-input-4-ee10e1c1c77e> in <module>()
2
3 model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
----> 4 tokenizer = AutoTokenizer.from_pretrained(model_name)
5 model = T5ForConditionalGeneration.from_pretrained(model_name)
6
4 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
94 else:
95 raise ValueError(
---> 96 "Couldn't instantiate the backend tokenizer from one of: "
97 "(1) a `tokenizers` library serialization file, "
98 "(2) a slow tokenizer instance to convert or "
ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
python | ValueError: Couldn't instantiate the backend tokenizer
First, I had to pip install sentencepiece . However, in the same code line, I was getting an error with sentencepiece .
Read more >Error with new tokenizers (URGENT!) - Hugging Face Forums
Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3)...
Read more >tokenization_utils.py - CodaLab Worksheets
Fast tokenizers are provided by HuggingFace's tokenizers library. ... pretrained model when loading the tokenizer with the ``from_pretrained()`` method.
Read more >Transformers Course - Chapter 3 - TF & Torch - Kaggle
In Chapter 2 we explored how to use tokenizers and pretrained models to make ... you will need a huggingface.co account: create an...
Read more >I can't run this code in Colab, need help. - Reddit
ValueError : Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok got it. Installing sentencepiece and restarting the kernel did the trick for me.
Thanks for your help 😃 Closing the issue.
I see it’s the classic sentencepiece error - I should have better read your error message 😉
Here the colab to show how it works: https://colab.research.google.com/drive/1QybYdj-1bW0MHD0cutWBPWas5IFEhSjC?usp=sharing