question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[pretrained] model classes aren't checking the arch of the pretrained model it loads

See original GitHub issue

While comparing different models trained on xsum (most of which are Bart) I made a mistake and passed “google/pegasus-xsum” to BartForConditionalGeneration

BartForConditionalGeneration.from_pretrained("google/pegasus-xsum")

I got:

Some weights of the model checkpoint at google/pegasus-xsum were not used when initializing BartForConditionalGeneration: ['model.encoder.layer_norm.weight', 'model.encoder.layer_norm.bias', 'model.decoder.layer_norm.weight', 'model.decoder.layer_norm.bias']
- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.encoder.embed_positions.weight', 'model.encoder.layernorm_embedding.weight', 'model.encoder.layernorm_embedding.bias', 'model.decoder.embed_positions.weight', 'model.decoder.layernorm_embedding.weight', 'model.decoder.layernorm_embedding.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "./bart-summarize2.py", line 8, in <module>
    tokenizer = BartTokenizer.from_pretrained(mname)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/tokenization_utils_base.py", line 1860, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
    super().__init__(
  File "/mnt/nvme1/code/huggingface/transformers-master/src/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Any reason why the model class doesn’t check that it’s being fed a wrong architecture? It could detect that and give a corresponding error message, rather than spitting random errors like above? I was pretty sure it was a bug in pegasus model until I noticed that pegasus != Bart.

Thanks.

@LysandreJik

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
vimarshccommented, Mar 7, 2021

Hi, I’ve made some progress on this issue. Think I’ve fixed it for initiating models. To show if my approach is fine shall I submit a PR?

I’ve essentially added an assert statement in the from_pretrained method in the PretrainedConfig class.

1reaction
ankh6commented, Feb 23, 2021

Hi @LysandreJik Does someone work on that ? I’d like to make my first contribution to the project

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading pre-trained weights from a local file rather than from a ...
I've tried specifying the parameter pretrained_fnames as in this old topic but then got a “missing 1 required positional argument: 'arch'” error ...
Read more >
Models - Hugging Face
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest...
Read more >
How to load part of pre trained model? - PyTorch Forums
Splitting Pre-Trained Model by its Parameters. How to tranfer weight of trained model and map on which have fewer classes?
Read more >
Evaluating Pre-trained Models — fairseq 0.12.2 documentation
First, download a pre-trained model along with its vocabularies: ... --bpe-codes $MODEL_DIR/bpecodes | loading model(s) from wmt14.en-fr.fconv-py/model.pt ...
Read more >
Finetuning Pre-trained Building Footprint Model
Load training data ... Please check your dataset. 3 images dont have the corresponding label files. Visualize training data. To get a sense...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found