Cannot instantiate Tokenizer
See original GitHub issueI am using Huggingface Transformers 4.0.0. When I instantiate the autotokenizer for indicbert, I get the following issue:
My code:
tokenizer = AutoTokenizer.from_pretrained('ai4bharat/indic-bert')
Error:
Couldn’t instantiate the backend tokenizer from one of: (1) a tokenizers
library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (4 by maintainers)
Top Results From Across the Web
Error with new tokenizers (URGENT!) - Hugging Face Forums
Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3)...
Read more >python | ValueError: Couldn't instantiate the backend tokenizer
First, I had to pip install sentencepiece . However, in the same code line, I was getting an error with sentencepiece .
Read more >tf.keras.preprocessing.text.Tokenizer | TensorFlow v2.11.0
Returns the tokenizer configuration as Python dictionary. The word count dictionaries used by the tokenizer get serialized into plain JSON, so ...
Read more >tokenize — Tokenizer for Python source — Python 3.11.1 ...
The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens...
Read more >Tokenizers | Apache Solr Reference Guide 6.6
The class attribute names a factory class that will instantiate a tokenizer object when needed. Tokenizer factory classes implement the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here the solution, sorry!
https://github.com/huggingface/transformers/releases/tag/v4.0.0
We must put
use_fast=False
in the tokenizer!Thanks again!
Hi, I’m also having this problem. Trying to instantiate
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-nl", use_fast=False)
but I get “ValueError: This tokenizer cannot be instantiated. Please make sure you havesentencepiece
installed in order to use this tokenizer.”But I have already installed sentencepiece. I have:
The above code snippet with “Musixmatch/umberto-wikipedia-uncased-v1” also doesn’t work for me.
Anyone have more ideas?