Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading fine_tuned BertModel fails due to prefix error

See original GitHub issue

I am loading a pretrained BERT model with BertModel.from_pretrained as I feed the pooled_output representation directly to a loss without a head. After fine-tuning the model, I save it as in run_classifier.py. Afterwards, I want to load the fine-tuned model, again without a head, so I’m using BertModel.from_pretrained model again to initialize it, this time from the directory where the config and model files are stored. When trying to load the pretrained model, none of the weights are found and I get:

Weights of BertModel not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight'
, 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert
.embeddings.LayerNorm.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.0.attention.self
.query.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.key.weight', ...]

This seems to be due to this line in modeling.py. As BertModel.from_pretrained does not create a bert attribute (in contrast to the BertModels with a head), the bert. prefix is used erroneously instead of the '' prefix, which causes the weights of the fine-tuned model not to be found. If I change this line to check additionally if we load a fine-tuned model, then this works:

load(model, prefix='' if hasattr(model, 'bert') or pretrained_model_name not in PRETRAINED_MODEL_ARCHIVE_MAP else 'bert.')

Does this make sense? Let me know if I’m using BertModel.from_pretrained in the wrong way or if I should be using a different model for fine-tuning if I just care about the pooled_output representation.

Issue Analytics

State:
Created 5 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

2reactions

thomwolfcommented, Jan 24, 2019

Actually Sebastian, since the model you save and the model you load are instances of the same BertModel class, you can also simply use the standard PyTorch serialization practice (we only have a special from_pretrained loading function to be able to load various type of models using the same pre-trained model stored on AWS).

Just build a new BertModel using the configuration file you saved.

Here is a snippet :

# Saving (same as you did)
model_to_save = model_base.module if hasattr(model_base, 'module') else model_base
torch.save(model_to_save.state_dict(), save_file)
with open(config_file, 'w') as f:
   f.write(model_base.config.to_json_string())

# Loading (using standard PyTorch loading practice)
config = BertConfig(config_file)
model = BertModel(config)
model.load_state_dict(torch.load(save_file))

0reactions

tsivagurucommented, Jun 18, 2020

Hi All,

iam facing following issue while loading pretrained BERT Sequence model with my own data

RuntimeError: Error(s) in loading state_dict for DataParallel: Missing key(s) in state_dict: “module.out.weight”, “module.out.bias”. Unexpected key(s) in state_dict: “bert.embeddings.word_embeddings.weight”, “bert.embeddings.position_embeddings.weight”, “bert.embeddings.token_type_embeddings.weight”, “bert.embeddings.LayerNorm.weight”, “bert.embeddings.LayerNorm.bias”, “bert.encoder.layer.0.attention.self.query.weight”, “bert.encoder.layer.0.attention.self.query.bias”, “bert.encoder.layer.0.attention.self.key.weight”, “bert.encoder.layer.0.attention.self.key.bias”, “bert.encoder.layer.0.attention.self.value.weight”, “bert.encoder.layer.0.attention.self.value.bias”, “bert.encoder.layer.0.attention.output.dense.weight”, “bert.encoder.layer.0.attention.output.dense.bias”, “bert.encoder.layer.0.attention.output.LayerNorm.weight”, “bert.encoder.layer.0.attention.output.LayerNorm.bias”, “bert.encoder.layer.0.intermediate.dense.weight”, “bert.encoder.layer.0.intermediate.dense.bias”, “bert.encoder.layer.0.output.dense.weight”, “bert.encoder.layer.0.output.dense.bias”, “bert.encoder.layer.0.output.LayerNorm.weight”, “bert.encoder.layer.0.output.LayerNorm.bias”, “bert.encoder.layer.1.attention.self.query.weight”, “bert.encoder.layer.1.attention.self.query.bias”, “bert.encoder.layer.1.attention.self.key.weight”, “bert.encoder.layer.1.attention.self.key.bias”, “bert.encoder.layer.1.attention.self.value.weight”, “bert.encoder.layer.1.attention.self.value.bias”, “bert.encoder.layer.1.attention.output.dense.weight”, “bert.encoder.layer.1.attention.output.dense.bias”, "bert.encoder.layer.1.attention.output.LayerNorm…

any idea about this error