question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

['encoder.version', 'decoder.version'] are unexpected when loading a pretrained BART model

See original GitHub issue

Using an example from the bart doc: https://huggingface.co/transformers/model_doc/bart.html#bartforconditionalgeneration

from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
TXT = "My friends are <mask> but they eat too many carbs."

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
logits = model(input_ids)[0]

masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)

print(tokenizer.decode(predictions).split())

gives:

Some weights of the model checkpoint at facebook/bart-large were not used 
when initializing BartForConditionalGeneration: 
['encoder.version', 'decoder.version']

- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
test:9: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1597302504919/work/torch/csrc/utils/python_arg_parser.cpp:864.)
  masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
['good', 'great', 'all', 'really', 'very']

well, there is one more issue of using a weird deprecated nonzero() invocation, which has to do with some strange undocumented requirement to pass the as_tuple arg, since pytorch 1.5 .https://github.com/pytorch/pytorch/issues/43425

we have authorized_missing_keys: authorized_missing_keys = [r"final_logits_bias", r"encoder\.version", r"decoder\.version"] https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L942 which correctly updates missing_keys - should there be also an authorized_unexpected_keys which would clean up unexpected_keys?

(note: I re-edited this issue once I understood it better to save reader’s time, the history is there if someone needs it)

And found another variety of it: for ['model.encoder.version', 'model.decoder.version']

tests/test_modeling_bart.py::BartModelIntegrationTests::test_mnli_inference Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
PASSED

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
LysandreJikcommented, Sep 10, 2020

The simplest and cleanest way would probably to simply remove these two variables from the state dict, wouldn’t it? If reconverting the checkpoint you should check that it is exactly the same as the previous one, which sounds like more of a pain and more error prone than simply doing

!wget https://cdn.huggingface.co/facebook/bart-large/pytorch_model.bin

weights = torch.load('/path/to/pytorch_model.bin')
del weights['encoder.version']
del weights['decoder.version']
torch.save(weights, 'new_pytorch_model.bin')
1reaction
sshleifercommented, Sep 10, 2020

Done. Also converted weights to fp16.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Leveraging Pre-trained Language Model Checkpoints for ...
Introduction - Short summary of pre-trained language models in NLP and the need for warm-starting encoder-decoder models. Warm-starting encoder- ...
Read more >
BART-light: One Decoder Layer Is Enough - OpenReview
Encoder -decoder language models (LMs) with the ... repository with the final version of this paper. ... BART is pretrained as a denoising...
Read more >
Building NLP Powered Applications with Hugging Face ...
T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted ...
Read more >
10 Leading Language Models For NLP In 2022 - TOPBOTS
The introduction of transfer learning and pretrained language models in ... Using a byte-level version of Byte Pair Encoding (BPE) for input representation....
Read more >
BART: Denoising Sequence-to-Sequence Pre-training ... - arXiv
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found