Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoModel.from_config loads random parameter values.

See original GitHub issue

🐛 Bug

Information

Model I am using (Bert, XLNet …): Bert

Language I am using the model on (English, Chinese …): English

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below) Model parameters are (apparently) random initialized when using AutoModel.from_config.

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

git clone https://github.com/gkutiel/transformers-bug
cd transformers-bug
pipenv shell
pipenv install
python main.py

from transformers import (
    AutoModel,
    AutoConfig,
)

pretrained = 'bert-base-uncased'

model_from_pretrained = AutoModel.from_pretrained(pretrained)
model_from_config = AutoModel.from_config(AutoConfig.from_pretrained(pretrained))

model_from_pretrained_params = list(model_from_pretrained.parameters())
model_from_config_params = list(model_from_config.parameters())

assert len(model_from_pretrained_params) == len(model_from_config_params)

model_from_pretrained_first_param = model_from_pretrained_params[0][0][0]
model_from_config_first_param = model_from_config_params[0][0][0]

assert model_from_pretrained_first_param == model_from_config_first_param, (
    f'{model_from_pretrained_first_param} != {model_from_config_first_param}'
)

Expected behavior

An assertion error should not happen.

Environment info

transformers version: 2.10.0
Platform: MacOS
Python version:3.6
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

4reactions

BramVanroycommented, May 30, 2020

This is expected behaviour, but I understand your confusion.

model_from_pretrained = AutoModel.from_pretrained(pretrained)

This actually loads the pretrained weights. It looks up the mapping and locations of the config file and the weights, and loads both.

model_from_config = AutoModel.from_config(AutoConfig.from_pretrained(pretrained))

Here, the pretrained weights are never requested. You request the pretrained config (basically the pretraining settings for the architecture), and (randomly) initialise an AutoModel given that config - but the weights are never requested and, thus, never loaded.

This means that both initialised models will have the same architecture, the same config, but different weights. The former has pretrained weights, the latter is randomly initialised.

I think that what you expected or wanted is actually this, which will load pretrained weights and taking into account a pretrained config (however, this is practically the same as the first option):

model_from_config = AutoModel.from_pretrained(pretrained, config=AutoConfig.from_pretrained(pretrained))

Hope that helps.

1reaction

BramVanroycommented, Jun 1, 2020

Oh, sorry @BramVanroy I didn’t see you assigned it to yourself. Do you want to add the documentation note? Maybe you have additional ideas of where it should be added?

Oh, go ahead! You know the library better than I do so your judgement of where to add a note is better.

Top Results From Across the Web

Configuration

Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. A configuration file can be ...

Release v0.1.0 THUMNLab/aglteam

This is done by modules named auto model and hyperparameter optimization. In the auto model, several commonly used graph deep models are pro-....

Source code for autogluon.multimodal.predictor

Each key's value can be a string, yaml file path, or OmegaConf's DictConfig. ... the current predictor. seed The random seed to use...

importerror - Error when loading pipelines in spaCy 3.0

This occcurs when using spacy.load() and importing the pipelines as a ... Path 3 import random ----> 4 from transformers import AutoModel, ...

CHANGELOG - AllenNLP v2.10.1

Load model on CPU post training to save GPU memory. ... that leads to validation occuring after the first epoch regardless of validation_start...