question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot instantiate Tokenizer

See original GitHub issue

I am using Huggingface Transformers 4.0.0. When I instantiate the autotokenizer for indicbert, I get the following issue:

My code: tokenizer = AutoTokenizer.from_pretrained('ai4bharat/indic-bert')

Error: Couldn’t instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

7reactions
denocriscommented, Dec 16, 2020

Here the solution, sorry!

https://github.com/huggingface/transformers/releases/tag/v4.0.0

We must put use_fast=False in the tokenizer!

Thanks again!

6reactions
LauraIstcommented, Jan 7, 2021

Hi, I’m also having this problem. Trying to instantiate tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-nl", use_fast=False) but I get “ValueError: This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed in order to use this tokenizer.”

But I have already installed sentencepiece. I have:

  • pip: 20.3.3
  • sentencepiece: 0.1.94
  • transformers: 4.1.1

The above code snippet with “Musixmatch/umberto-wikipedia-uncased-v1” also doesn’t work for me.

Anyone have more ideas?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error with new tokenizers (URGENT!) - Hugging Face Forums
Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3)...
Read more >
python | ValueError: Couldn't instantiate the backend tokenizer
First, I had to pip install sentencepiece . However, in the same code line, I was getting an error with sentencepiece .
Read more >
tf.keras.preprocessing.text.Tokenizer | TensorFlow v2.11.0
Returns the tokenizer configuration as Python dictionary. The word count dictionaries used by the tokenizer get serialized into plain JSON, so ...
Read more >
tokenize — Tokenizer for Python source — Python 3.11.1 ...
The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens...
Read more >
Tokenizers | Apache Solr Reference Guide 6.6
The class attribute names a factory class that will instantiate a tokenizer object when needed. Tokenizer factory classes implement the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found