question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some community models are broken and can't be downloaded

See original GitHub issue

šŸ› Bug

Information

Model I am using (Bert, XLNet …): Community Models

Language I am using the model on (English, Chinese …): Multiple different ones

Quite some community models can’t be loaded. The stats are here:

Stats

  1. 68 can’t load either their config (n)or their tokenizer:
  • a) 34 models can’t even load their config file. The reasons for this are either:

    • i. 11/34: Model identifier is wrong, e.g. albert-large does not exist anymore, it seems like it was renamed to albert-large-v1. These models have saved the another name online than how it is saved on AWS.

    • ii. 23/34: There is an unrecognized model_type in the config.json, e.g.

"Error: Message: Unrecognized model in hfl/rbtl3. Should have a model_type key in its config.json, or contain one of the following strings in its name: t5, distilbert, albert, camembert, xlm-roberta, bart, roberta, flaubert, bert, openai-gpt, gpt2, transfo-xl, xlnet, xlm, ctrl "

  • b) 33 models can load their config, but cannot load their tokenizers. The error message is almost always the same:

TOK ERROR: clue/roberta_chinese_base tokenizer can not be loaded Message: Model name ā€˜clue/roberta_chinese_base’ was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed ā€˜clue/roberta_chinese_base’ was a path, a model identifier, or url to a directory containing vocabulary files named [ā€˜vocab.json’, ā€˜merges.txt’] but couldn’t find such vocabulary files at this path or url.

  • i. Here: the model has neither of: - vocab_file - added_tokens_file - special_tokens_map_file - tokenizer_config_file
  1. 79 currently have wrong pad_token_id, eos_token_id, bos_token_id in their configs. IMPORTANT: The reason for this is that we used to have the wrong defaults saved in PretrainedConfig() - see e.g. here the default value for any model for pad_token_id was 0. People trained a model with the lib, saved it and the resulting config.json now had a pad_token_id = 0 saved. This was then uploaded. But it’s wrong and should be corrected.

  2. For 162 models everything is fine!

Here the full analysis log here Here the code that created this log (simple comparison of loaded tokenizer and config with default config): here

HOW-TO-FIX-STEPS (in the following order):

  • Fix 1 a) i. first: All models that have a wrong model identifier path should get the correct one. Need to update some model identifier paths on https://huggingface.co/models like changing bertabs-finetuned-xsum-extractive-abstractive-summarization to remi/bertabs-finetuned-xsum-extractive-abstractive-summarization. Some of those errors are very weird, see #3358

  • Fix 1 a) ii. shoud be quite easy to add the correct model_type to the config.json

  • Fix 1 b) Not sure how to fix the lacking tokenizer files most efficiently @julien-c

  • Fix 2) Create automated script that:

    1. If tokenizer.pad_token_id != default_config.pad_token_id -> config.pad_token_id = tokenizer.pad_token_id else remove pad_token_id.
    1. Removes all eos_token_ids -> they don’t exist anymore

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:5
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
XiangQinYucommented, May 29, 2021

When I use ernie model pretained by BaiDu, I had the same problem. My solution is to add ā€œmodel_typeā€:ā€œbertā€ to the configuration file, It worked, but I don’t know if it’s reasonable.

0reactions
drussellmrichiecommented, Aug 23, 2021

When I use ernie model pretained by BaiDu, I had the same problem. My solution is to add ā€œmodel_typeā€:ā€œbertā€ to the configuration file, It worked, but I don’t know if it’s reasonable.

Hi, @XiangQinYu. I’m a bit of a newbie with Huggingface. Can you say more about how you did this? I guess you mean adding ā€œmodel_typeā€:ā€œbertā€ to a file like this. But how did you edit the file? Did you download the whole model repository, and edit and run it locally?

EDIT: Nevermind, figured it out with help of a commenter on a question I asked on SO.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Workshop models won't download!Heeeelp! - Steam Community
Right-click Source FilmMaker in your Steam library, switch to the "Betas" tab, and make sure the drop-down list is set to "NONE -...
Read more >
Fix: Steam Workshop not Downloading Mods - Appuals.com
Some Steam users that are trying to download mods from Steam's Workshop are reportedly having problems subscribing to mods.
Read more >
Paint 3D's "Save As" option is broken, models appear corrupted
Earlier this afternoon, I tried saving one of my Paint 3D files, only to get an error message, saying that the file could...
Read more >
My model went broken after I downloaded the 2020 version ...
Does anybody encounter this before? I don't know what is this problem called and I can't solve it until now. Please help me....
Read more >
Problem downloading free models - CGTrader
I do not think there is a limit for downloading free models. after clicking the download button you should wait 20 seconds and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found