ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.
See original GitHub issueEnvironment info
transformers
version: Latest transformers==4.2.0.dev0- Platform: Colab
- Python version: Python 3.6.9
- PyTorch version (GPU?): torch==1.7.0+cu101
- Tensorflow version (GPU?):
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
The following code indicated in the latest HF news letter seems to have isssues when I tried I get tokenizer error both under Fast and Slow (True/Flase tokenizer parameter) conditions when I had checked
The problem arises when using:
- the official example scripts: (give details below)
- [ ]
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa",use_fast=False )
model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa")
context = "HuggingFace won the best Demo paper at EMNLP2020."
question = "What won HuggingFace?"
input_text = 'question: %s context: %s' % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(**features)
tokenizer.decode(output[0])
To reproduce
Steps to reproduce the behavior:
- Run the above code on Google Colab
ERROR reported
`ValueError Traceback (most recent call last) <ipython-input-3-87256159791c> in <module>() 10 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM 11 —> 12 tokenizer = AutoTokenizer.from_pretrained(“mrm8488/mT5-small-finetuned-tydiqa-for-xqa”,use_fast=False ) 13 14 model = AutoModelForSeq2SeqLM.from_pretrained(“mrm8488/mT5-small-finetuned-tydiqa-for-xqa”)
/usr/local/lib/python3.6/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 358 if tokenizer_class is None: 359 raise ValueError( –> 360 “Tokenizer class {} does not exist or is not currently imported.”.format(tokenizer_class_candidate) 361 ) 362 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.`
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (4 by maintainers)
I had a similar problem
ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported.
and solved it by runningpip install sentencepiece
Seems that when missing the
sentencepiece
package,AutoTokenizer.from_pretrained
will silently not load the tokenizer and then crash later.This works fabulously with DeBerta models as well, seems that the error isn’t very descriptive.