question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

broken models on the hub

See original GitHub issue

Go to https://huggingface.co/sshleifer/distill-mbart-en-ro-12-6 click on “use in transformers”, copy-n-paste and nope can’t use this in transformers:

python -c 'from transformers import AutoTokenizer; AutoTokenizer.from_pretrained("sshleifer/distill-mbart-en-ro-12-6")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/auto/tokenization_auto.py", line 410, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1704, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1717, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1776, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
    super().__init__(
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

this is with the latest master.

These for example I tested to work fine:

  • sshleifer/distill-mbart-en-ro-12-4
  • sshleifer/distill-mbart-en-ro-12-9

Perhaps we need a sort of CI that goes over the public models, validates that run in transformers code succeeds and sends an alert if it doesn’t? We have no idea how many other models are broken on the hub right now.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
julien-ccommented, Mar 18, 2021

Note that this is not necessarily a low hanging fruit (depending on your definition of a low hanging fruit 😂) given that:

  • we have 7,000+ models whose total weights represent multiple TBs of data
  • they change over time
1reaction
stas00commented, Mar 18, 2021

I meant that just loading a model / tokenizer is cheaper/faster/requires almost 0 extra code to write - hence low-hanging fruit.

I hear you that the hub is huge, a little bit at a time. It would have been the same code to validate 10 models or 7K models if there is no urgency to complete it fast, it just would take much much longer to complete.

  • they change over time

That was exactly my point, they and the codebase too, so it’s not enough to check it once, even if we track when it was changed and when it was validated last.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Model broken on Hub: wav2vec robust - Hugging Face Forums
Model broken on Hub: wav2vec robust ... Hugging Face fails with an OSError, seemingly due to a problem with the uploaded model on...
Read more >
Models are broken when inserting from Fuel #350 - GitHub
I've tried to drag and drop several models from Fuel, but they're not loaded correctly. The red hatchback model cannot find the wheel...
Read more >
EV Hub repairing broken battery in Tesla Model S part 2
EV Hub repairing broken battery in Tesla Model S part 1 · Bilexperten inspecting Model X P90DL after 263k km/5 years · Does...
Read more >
EV Hub repairing broken battery in Tesla Model S part 1
You can contact EV Hub at: nfa@evparts-hub.comGet 30 day free Premium trial on ... EV Hub repairing broken battery in Tesla Model S...
Read more >
Ultralytics HUB
The stages are broken down into 3 simple steps that anyone can follow: upload your data, train your model and then deploy it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found