I-BERT tokenizer not loading; example code not working.
See original GitHub issueFollowing the example here, I’m trying to load the ‘kssteven/ibert-roberta-base’ tokenizer:
from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained('kssteven/ibert-roberta-base')
It errors out as follows:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/carola/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1710, in from_pretrained
resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
File "/Users/carola/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/Users/carola/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers/models/roberta/tokenization_roberta.py", line 171, in __init__
**kwargs,
File "/Users/carola/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Using transformers version 4.5.1 on Mac or Ubuntu
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
IBert Problems of hugging face pretrained #14176 - GitHub
Pretrained the IBert I want to test IBERT's, and I have done exactly what is said in https://huggingface.co/kssteven/ibert-roberta-base.
Read more >I-BERT - Hugging Face
In this work, we propose I-BERT, a novel quantization scheme for ... Initializing with a config file does not load the weights associated...
Read more >Huggingface AutoTokenizer can't load from local path
I have problem loading the tokenizer. I think the problem is with AutoTokenizer.from_pretrained('local/path/to/directory'). Code: from ...
Read more >Simple Starter Code - Distillibert Token Classific - Kaggle
In Kaggle environment with Distillibert, batch size of 8 works fine and I ... else: assert os.path.exists(paths), "Tokenizer local path is not found....
Read more >BERT - Tokenization and Encoding - Albert Au Yeung
For tokens not appearing in the original vocabulary, it is designed that they should be replaced with a special token [UNK] , which...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Would be nice if you open a new issue for this. Thanks.
@xiangsanliu this should be fixed now, the hardcoded
tokenizers_file
path is now removed (cf hub commit https://huggingface.co/kssteven/ibert-roberta-base/commit/0857df571974cf0633da7536addb8b9da230293b)you could pass
force_download=True
to.from_pretrained
to get the update config file.