Bert Checkpoint Breaks 3.02 -> 3.1.0 due to new buffer in BertEmbeddings
See original GitHub issueHi,
Thanks for the great library. I noticed this line being added (https://github.com/huggingface/transformers/blob/v3.1.0/src/transformers/modeling_bert.py#L190) in the latest update.
It breaks checkpoints that were saved when this line wasn’t there.
Missing key(s) in state_dict: "generator_model.electra.embeddings.position_ids", "discriminator_model.electra.embeddings.position_ids".
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:28 (10 by maintainers)
Top Results From Across the Web
BERT — transformers 3.1.0 documentation - Hugging Face
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
Read more >A Deep Dive into the Code of the BERT Model
We analyse separately the 3 parts: Embeddings, Encoder with 12 repeating Bert layers and Pooler. Eventually we will add a Classification Layer. BertEmbeddings...
Read more >Albert_base : weights from ckpt not loaded properly when ...
So my plan was to convert the saved_model(or model loaded from tf-hub) to checkpoint myself, and then pretrain albert-base using the code ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You can also use the
load_state_dict
method with thestrict
option set toFalse
:I think it’s safe to use
model.load_state_dict(state_dict, strict=False)
if the only missing information is theposition_ids
buffer. This tensor is indeed used by the model, but it’s just a constant tensor containing a list of integers from 0 to the maximum number of position embeddings. The tensor is first created in the constructor of theBertEmbeddings
class, in this line:https://github.com/huggingface/transformers/blob/fcf83011dffce3f2e8aad906f07c1ec14668f877/src/transformers/models/bert/modeling_bert.py#L182
As such, it’s not really part of the optimizable parameters of the model. This means that it doesn’t matter if
position_ids
is not available when callingload_state_dict
, because the line above will create it anyway in the constructor with the required values.