broken models on the hub
See original GitHub issueGo to https://huggingface.co/sshleifer/distill-mbart-en-ro-12-6 click on “use in transformers”, copy-n-paste and nope can’t use this in transformers
:
python -c 'from transformers import AutoTokenizer; AutoTokenizer.from_pretrained("sshleifer/distill-mbart-en-ro-12-6")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/auto/tokenization_auto.py", line 410, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1704, in from_pretrained
return cls._from_pretrained(
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1717, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1776, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
super().__init__(
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
this is with the latest master.
These for example I tested to work fine:
sshleifer/distill-mbart-en-ro-12-4
sshleifer/distill-mbart-en-ro-12-9
Perhaps we need a sort of CI that goes over the public models, validates that run in transformers
code succeeds and sends an alert if it doesn’t? We have no idea how many other models are broken on the hub right now.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Model broken on Hub: wav2vec robust - Hugging Face Forums
Model broken on Hub: wav2vec robust ... Hugging Face fails with an OSError, seemingly due to a problem with the uploaded model on...
Read more >Models are broken when inserting from Fuel #350 - GitHub
I've tried to drag and drop several models from Fuel, but they're not loaded correctly. The red hatchback model cannot find the wheel...
Read more >EV Hub repairing broken battery in Tesla Model S part 2
EV Hub repairing broken battery in Tesla Model S part 1 · Bilexperten inspecting Model X P90DL after 263k km/5 years · Does...
Read more >EV Hub repairing broken battery in Tesla Model S part 1
You can contact EV Hub at: nfa@evparts-hub.comGet 30 day free Premium trial on ... EV Hub repairing broken battery in Tesla Model S...
Read more >Ultralytics HUB
The stages are broken down into 3 simple steps that anyone can follow: upload your data, train your model and then deploy it...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Note that this is not necessarily a low hanging fruit (depending on your definition of a low hanging fruit 😂) given that:
I meant that just loading a model / tokenizer is cheaper/faster/requires almost 0 extra code to write - hence low-hanging fruit.
I hear you that the hub is huge, a little bit at a time. It would have been the same code to validate 10 models or 7K models if there is no urgency to complete it fast, it just would take much much longer to complete.
That was exactly my point, they and the codebase too, so it’s not enough to check it once, even if we track when it was changed and when it was validated last.