Revisions not working as expected
See original GitHub issueWe were getting a size mismatch when loading a finetuned checkpoint. After looking at the model config, I found that it had been updated and that the embedding/vocab size had increased. This is slightly annoying but not the core of this issue. My way of dealing with this, then, was naturally to rely on version control and simply use the previous commit which still had the config that we used for finetuning (this one). I would have expected that I can then load this revised model with the commit as given on the website:
from transformers import AutoModel
model_name = "GroNLP/bert-base-dutch-cased"
revision = "61330c1"
model = AutoModel.from_pretrained(model_name, revision=revision)
This does not work an throws an error that the model cannot be found with the following message:
OSError: Can't load config for 'GroNLP/bert-base-dutch-cased'. Make sure that:
- 'GroNLP/bert-base-dutch-cased' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'GroNLP/bert-base-dutch-cased' is the correct path to a directory containing a config.json file
A first improvement would be to add to this error message something about revisions, because obviously GroNLP/bert-base-dutch-cased
is a correct name. The deeper issue is that the model revision is simply not found when I use the commit tag on the website. By coincidence I noticed that the URL includes a much longer identifier that starts with the commit number that you can see on the website (the full commit hash). When you try that, the code does run and the revision is correctly loaded.
from transformers import AutoModel
model_name = "GroNLP/bert-base-dutch-cased"
revision = "61330c1ca1aa3a688f8aa015059142a1b20d3f63"
model = AutoModel.from_pretrained(model_name, revision=revision)
So the bug is either
- the model is not capable of looking up a revision based on the first seven characters of a hash (not sure if it should/could),
- or the model hub website does not provide enough information to make this intuitive for users.
One way that would help, for instance, is that the “use in transformers” button adapts itself to the current revision that a user is browsing and when clicked it includes the revision (if any) in the example usage. And/or a copy function can be added to the commit identifier that - when clicked - does copies the whole hash.
Who can help
Note sure who to tag for the model page so tagging @sgugger and @LysandreJik
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:11 (4 by maintainers)
Hey @BramVanroy (and @sgugger) we solved this through better UX, actually:
if you take a look at commit history on https://huggingface.co/bert-base-uncased/commits/main you now have buttons to copy the full commit hash (exactly like on GitHub), thanks to @beurkinger on the Hub team.
see screenshot below:
Hope this helps!
Hi @BramVanroy, thanks for opening an issue! This is also tracked in https://github.com/huggingface/huggingface_hub/issues/197 cc @julien-c @Pierrci
There’s definitely an improvement to be done regarding the mention of the revision in the error message, feel free to give it a try if you have the time to, otherwise we’ll take care of it ASAP.