Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

['encoder.version', 'decoder.version'] are unexpected when loading a pretrained BART model

See original GitHub issue

Using an example from the bart doc: https://huggingface.co/transformers/model_doc/bart.html#bartforconditionalgeneration

from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
TXT = "My friends are <mask> but they eat too many carbs."

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
logits = model(input_ids)[0]

masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)

print(tokenizer.decode(predictions).split())

gives:

Some weights of the model checkpoint at facebook/bart-large were not used 
when initializing BartForConditionalGeneration: 
['encoder.version', 'decoder.version']

- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
test:9: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1597302504919/work/torch/csrc/utils/python_arg_parser.cpp:864.)
  masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
['good', 'great', 'all', 'really', 'very']

well, there is one more issue of using a weird deprecated nonzero() invocation, which has to do with some strange undocumented requirement to pass the as_tuple arg, since pytorch 1.5 .https://github.com/pytorch/pytorch/issues/43425

we have authorized_missing_keys: authorized_missing_keys = [r"final_logits_bias", r"encoder\.version", r"decoder\.version"] https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L942 which correctly updates missing_keys - should there be also an authorized_unexpected_keys which would clean up unexpected_keys?

(note: I re-edited this issue once I understood it better to save reader’s time, the history is there if someone needs it)

And found another variety of it: for ['model.encoder.version', 'model.decoder.version']

tests/test_modeling_bart.py::BartModelIntegrationTests::test_mnli_inference Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
PASSED

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

LysandreJikcommented, Sep 10, 2020

The simplest and cleanest way would probably to simply remove these two variables from the state dict, wouldn’t it? If reconverting the checkpoint you should check that it is exactly the same as the previous one, which sounds like more of a pain and more error prone than simply doing

!wget https://cdn.huggingface.co/facebook/bart-large/pytorch_model.bin

weights = torch.load('/path/to/pytorch_model.bin')
del weights['encoder.version']
del weights['decoder.version']
torch.save(weights, 'new_pytorch_model.bin')

1reaction

sshleifercommented, Sep 10, 2020

Done. Also converted weights to fp16.

Top Results From Across the Web

Leveraging Pre-trained Language Model Checkpoints for ...

Introduction - Short summary of pre-trained language models in NLP and the need for warm-starting encoder-decoder models. Warm-starting encoder- ...

BART-light: One Decoder Layer Is Enough - OpenReview

Encoder -decoder language models (LMs) with the ... repository with the final version of this paper. ... BART is pretrained as a denoising...

Building NLP Powered Applications with Hugging Face ...

T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted ...

10 Leading Language Models For NLP In 2022 - TOPBOTS

The introduction of transfer learning and pretrained language models in ... Using a byte-level version of Byte Pair Encoding (BPE) for input representation....

BART: Denoising Sequence-to-Sequence Pre-training ... - arXiv

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary ...