['encoder.version', 'decoder.version'] are unexpected when loading a pretrained BART model
See original GitHub issueUsing an example from the bart doc: https://huggingface.co/transformers/model_doc/bart.html#bartforconditionalgeneration
from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
TXT = "My friends are <mask> but they eat too many carbs."
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
logits = model(input_ids)[0]
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)
print(tokenizer.decode(predictions).split())
gives:
Some weights of the model checkpoint at facebook/bart-large were not used
when initializing BartForConditionalGeneration:
['encoder.version', 'decoder.version']
- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
test:9: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1597302504919/work/torch/csrc/utils/python_arg_parser.cpp:864.)
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
['good', 'great', 'all', 'really', 'very']
well, there is one more issue of using a weird deprecated nonzero()
invocation, which has to do with some strange undocumented requirement to pass the as_tuple
arg, since pytorch 1.5 .https://github.com/pytorch/pytorch/issues/43425
we have authorized_missing_keys
:
authorized_missing_keys = [r"final_logits_bias", r"encoder\.version", r"decoder\.version"]
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L942
which correctly updates missing_keys
- should there be also an authorized_unexpected_keys
which would clean up unexpected_keys
?
(note: I re-edited this issue once I understood it better to save reader’s time, the history is there if someone needs it)
And found another variety of it: for ['model.encoder.version', 'model.decoder.version']
tests/test_modeling_bart.py::BartModelIntegrationTests::test_mnli_inference Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
PASSED
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
The simplest and cleanest way would probably to simply remove these two variables from the state dict, wouldn’t it? If reconverting the checkpoint you should check that it is exactly the same as the previous one, which sounds like more of a pain and more error prone than simply doing
Done. Also converted weights to fp16.