[pretrained] model classes aren't checking the arch of the pretrained model it loads
See original GitHub issueWhile comparing different models trained on xsum (most of which are Bart) I made a mistake and passed “google/pegasus-xsum” to BartForConditionalGeneration
BartForConditionalGeneration.from_pretrained("google/pegasus-xsum")
I got:
Some weights of the model checkpoint at google/pegasus-xsum were not used when initializing BartForConditionalGeneration: ['model.encoder.layer_norm.weight', 'model.encoder.layer_norm.bias', 'model.decoder.layer_norm.weight', 'model.decoder.layer_norm.bias']
- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.encoder.embed_positions.weight', 'model.encoder.layernorm_embedding.weight', 'model.encoder.layernorm_embedding.bias', 'model.decoder.embed_positions.weight', 'model.decoder.layernorm_embedding.weight', 'model.decoder.layernorm_embedding.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "./bart-summarize2.py", line 8, in <module>
tokenizer = BartTokenizer.from_pretrained(mname)
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
return cls._from_pretrained(
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1860, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
super().__init__(
File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Any reason why the model class doesn’t check that it’s being fed a wrong architecture? It could detect that and give a corresponding error message, rather than spitting random errors like above? I was pretty sure it was a bug in pegasus model until I noticed that pegasus != Bart.
Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (10 by maintainers)
Top Results From Across the Web
Loading pre-trained weights from a local file rather than from a ...
I've tried specifying the parameter pretrained_fnames as in this old topic but then got a “missing 1 required positional argument: 'arch'” error ...
Read more >Models - Hugging Face
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest...
Read more >How to load part of pre trained model? - PyTorch Forums
Splitting Pre-Trained Model by its Parameters. How to tranfer weight of trained model and map on which have fewer classes?
Read more >Evaluating Pre-trained Models — fairseq 0.12.2 documentation
First, download a pre-trained model along with its vocabularies: ... --bpe-codes $MODEL_DIR/bpecodes | loading model(s) from wmt14.en-fr.fconv-py/model.pt ...
Read more >Finetuning Pre-trained Building Footprint Model
Load training data ... Please check your dataset. 3 images dont have the corresponding label files. Visualize training data. To get a sense...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I’ve made some progress on this issue. Think I’ve fixed it for initiating models. To show if my approach is fine shall I submit a PR?
I’ve essentially added an assert statement in the
from_pretrained
method in thePretrainedConfig
class.Hi @LysandreJik Does someone work on that ? I’d like to make my first contribution to the project