AutoModel.from_config loads random parameter values.
See original GitHub issue🐛 Bug
Information
Model I am using (Bert, XLNet …): Bert
Language I am using the model on (English, Chinese …): English
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
Model parameters are (apparently) random initialized when using
AutoModel.from_config
.
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
git clone https://github.com/gkutiel/transformers-bug
cd transformers-bug
pipenv shell
pipenv install
python main.py
from transformers import (
AutoModel,
AutoConfig,
)
pretrained = 'bert-base-uncased'
model_from_pretrained = AutoModel.from_pretrained(pretrained)
model_from_config = AutoModel.from_config(AutoConfig.from_pretrained(pretrained))
model_from_pretrained_params = list(model_from_pretrained.parameters())
model_from_config_params = list(model_from_config.parameters())
assert len(model_from_pretrained_params) == len(model_from_config_params)
model_from_pretrained_first_param = model_from_pretrained_params[0][0][0]
model_from_config_first_param = model_from_config_params[0][0][0]
assert model_from_pretrained_first_param == model_from_config_first_param, (
f'{model_from_pretrained_first_param} != {model_from_config_first_param}'
)
Expected behavior
An assertion error should not happen.
Environment info
transformers
version: 2.10.0- Platform: MacOS
- Python version:3.6
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Configuration
Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. A configuration file can be ...
Read more >Release v0.1.0 THUMNLab/aglteam
This is done by modules named auto model and hyperparameter optimization. In the auto model, several commonly used graph deep models are pro-....
Read more >Source code for autogluon.multimodal.predictor
Each key's value can be a string, yaml file path, or OmegaConf's DictConfig. ... the current predictor. seed The random seed to use...
Read more >importerror - Error when loading pipelines in spaCy 3.0
This occcurs when using spacy.load() and importing the pipelines as a ... Path 3 import random ----> 4 from transformers import AutoModel, ...
Read more >CHANGELOG - AllenNLP v2.10.1
Load model on CPU post training to save GPU memory. ... that leads to validation occuring after the first epoch regardless of validation_start...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is expected behaviour, but I understand your confusion.
This actually loads the pretrained weights. It looks up the mapping and locations of the config file and the weights, and loads both.
Here, the pretrained weights are never requested. You request the pretrained config (basically the pretraining settings for the architecture), and (randomly) initialise an AutoModel given that config - but the weights are never requested and, thus, never loaded.
This means that both initialised models will have the same architecture, the same config, but different weights. The former has pretrained weights, the latter is randomly initialised.
I think that what you expected or wanted is actually this, which will load pretrained weights and taking into account a pretrained config (however, this is practically the same as the first option):
Hope that helps.
Oh, go ahead! You know the library better than I do so your judgement of where to add a note is better.