Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.

See original GitHub issue

@mfuntowicz

Environment info

transformers version: Latest transformers==4.2.0.dev0
Platform: Colab
Python version: Python 3.6.9
PyTorch version (GPU?): torch==1.7.0+cu101
Tensorflow version (GPU?):
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@mfuntowicz

Information

The following code indicated in the latest HF news letter seems to have isssues when I tried I get tokenizer error both under Fast and Slow (True/Flase tokenizer parameter) conditions when I had checked

The problem arises when using:

the official example scripts: (give details below)
[ ]

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM 

tokenizer = AutoTokenizer.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa",use_fast=False )

model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa")

context = "HuggingFace won the best Demo paper at EMNLP2020."
question = "What won HuggingFace?"
input_text = 'question: %s context: %s' % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(**features)
tokenizer.decode(output[0])

To reproduce

Steps to reproduce the behavior:

Run the above code on Google Colab

ERROR reported

`ValueError Traceback (most recent call last) <ipython-input-3-87256159791c> in <module>() 10 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM 11 —> 12 tokenizer = AutoTokenizer.from_pretrained(“mrm8488/mT5-small-finetuned-tydiqa-for-xqa”,use_fast=False ) 13 14 model = AutoModelForSeq2SeqLM.from_pretrained(“mrm8488/mT5-small-finetuned-tydiqa-for-xqa”)

/usr/local/lib/python3.6/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 358 if tokenizer_class is None: 359 raise ValueError( –> 360 “Tokenizer class {} does not exist or is not currently imported.”.format(tokenizer_class_candidate) 361 ) 362 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.`

Issue Analytics

State:
Created 3 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

11reactions

gingsicommented, Nov 30, 2021

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece

Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

4reactions

johnpaulbincommented, Dec 14, 2021

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece

Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

This works fabulously with DeBerta models as well, seems that the error isn’t very descriptive.