Loading fine_tuned BertModel fails due to prefix error
See original GitHub issueI am loading a pretrained BERT model with BertModel.from_pretrained as I feed the pooled_output representation directly to a loss without a head. After fine-tuning the model, I save it as in run_classifier.py.
Afterwards, I want to load the fine-tuned model, again without a head, so I’m using BertModel.from_pretrained model again to initialize it, this time from the directory where the config and model files are stored.
When trying to load the pretrained model, none of the weights are found and I get:
Weights of BertModel not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight'
, 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert
.embeddings.LayerNorm.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.0.attention.self
.query.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.key.weight', ...]
This seems to be due to this line in modeling.py. As BertModel.from_pretrained does not create a bert attribute (in contrast to the BertModels with a head), the bert. prefix is used erroneously instead of the '' prefix, which causes the weights of the fine-tuned model not to be found.
If I change this line to check additionally if we load a fine-tuned model, then this works:
load(model, prefix='' if hasattr(model, 'bert') or pretrained_model_name not in PRETRAINED_MODEL_ARCHIVE_MAP else 'bert.')
Does this make sense? Let me know if I’m using BertModel.from_pretrained in the wrong way or if I should be using a different model for fine-tuning if I just care about the pooled_output representation.
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (5 by maintainers)

Top Related StackOverflow Question
Actually Sebastian, since the model you save and the model you load are instances of the same
BertModelclass, you can also simply use the standard PyTorch serialization practice (we only have a specialfrom_pretrainedloading function to be able to load various type of models using the same pre-trained model stored on AWS).Just build a new
BertModelusing the configuration file you saved.Here is a snippet :
Hi All,
iam facing following issue while loading pretrained BERT Sequence model with my own data
RuntimeError: Error(s) in loading state_dict for DataParallel: Missing key(s) in state_dict: “module.out.weight”, “module.out.bias”. Unexpected key(s) in state_dict: “bert.embeddings.word_embeddings.weight”, “bert.embeddings.position_embeddings.weight”, “bert.embeddings.token_type_embeddings.weight”, “bert.embeddings.LayerNorm.weight”, “bert.embeddings.LayerNorm.bias”, “bert.encoder.layer.0.attention.self.query.weight”, “bert.encoder.layer.0.attention.self.query.bias”, “bert.encoder.layer.0.attention.self.key.weight”, “bert.encoder.layer.0.attention.self.key.bias”, “bert.encoder.layer.0.attention.self.value.weight”, “bert.encoder.layer.0.attention.self.value.bias”, “bert.encoder.layer.0.attention.output.dense.weight”, “bert.encoder.layer.0.attention.output.dense.bias”, “bert.encoder.layer.0.attention.output.LayerNorm.weight”, “bert.encoder.layer.0.attention.output.LayerNorm.bias”, “bert.encoder.layer.0.intermediate.dense.weight”, “bert.encoder.layer.0.intermediate.dense.bias”, “bert.encoder.layer.0.output.dense.weight”, “bert.encoder.layer.0.output.dense.bias”, “bert.encoder.layer.0.output.LayerNorm.weight”, “bert.encoder.layer.0.output.LayerNorm.bias”, “bert.encoder.layer.1.attention.self.query.weight”, “bert.encoder.layer.1.attention.self.query.bias”, “bert.encoder.layer.1.attention.self.key.weight”, “bert.encoder.layer.1.attention.self.key.bias”, “bert.encoder.layer.1.attention.self.value.weight”, “bert.encoder.layer.1.attention.self.value.bias”, “bert.encoder.layer.1.attention.output.dense.weight”, “bert.encoder.layer.1.attention.output.dense.bias”, "bert.encoder.layer.1.attention.output.LayerNorm…
any idea about this error