question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoModel.from_config loads random parameter values.

See original GitHub issue

🐛 Bug

Information

Model I am using (Bert, XLNet …): Bert

Language I am using the model on (English, Chinese …): English

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below) Model parameters are (apparently) random initialized when using AutoModel.from_config.

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. git clone https://github.com/gkutiel/transformers-bug
  2. cd transformers-bug
  3. pipenv shell
  4. pipenv install
  5. python main.py
from transformers import (
    AutoModel,
    AutoConfig,
)

pretrained = 'bert-base-uncased'

model_from_pretrained = AutoModel.from_pretrained(pretrained)
model_from_config = AutoModel.from_config(AutoConfig.from_pretrained(pretrained))

model_from_pretrained_params = list(model_from_pretrained.parameters())
model_from_config_params = list(model_from_config.parameters())

assert len(model_from_pretrained_params) == len(model_from_config_params)

model_from_pretrained_first_param = model_from_pretrained_params[0][0][0]
model_from_config_first_param = model_from_config_params[0][0][0]

assert model_from_pretrained_first_param == model_from_config_first_param, (
    f'{model_from_pretrained_first_param} != {model_from_config_first_param}'
)

Expected behavior

An assertion error should not happen.

Environment info

  • transformers version: 2.10.0
  • Platform: MacOS
  • Python version:3.6
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?):
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
BramVanroycommented, May 30, 2020

This is expected behaviour, but I understand your confusion.

model_from_pretrained = AutoModel.from_pretrained(pretrained)

This actually loads the pretrained weights. It looks up the mapping and locations of the config file and the weights, and loads both.

model_from_config = AutoModel.from_config(AutoConfig.from_pretrained(pretrained))

Here, the pretrained weights are never requested. You request the pretrained config (basically the pretraining settings for the architecture), and (randomly) initialise an AutoModel given that config - but the weights are never requested and, thus, never loaded.

This means that both initialised models will have the same architecture, the same config, but different weights. The former has pretrained weights, the latter is randomly initialised.

I think that what you expected or wanted is actually this, which will load pretrained weights and taking into account a pretrained config (however, this is practically the same as the first option):

model_from_config = AutoModel.from_pretrained(pretrained, config=AutoConfig.from_pretrained(pretrained))

Hope that helps.

1reaction
BramVanroycommented, Jun 1, 2020

Oh, sorry @BramVanroy I didn’t see you assigned it to yourself. Do you want to add the documentation note? Maybe you have additional ideas of where it should be added?

Oh, go ahead! You know the library better than I do so your judgement of where to add a note is better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuration
Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. A configuration file can be ...
Read more >
Release v0.1.0 THUMNLab/aglteam
This is done by modules named auto model and hyperparameter optimization. In the auto model, several commonly used graph deep models are pro-....
Read more >
Source code for autogluon.multimodal.predictor
Each key's value can be a string, yaml file path, or OmegaConf's DictConfig. ... the current predictor. seed The random seed to use...
Read more >
importerror - Error when loading pipelines in spaCy 3.0
This occcurs when using spacy.load() and importing the pipelines as a ... Path 3 import random ----> 4 from transformers import AutoModel, ...
Read more >
CHANGELOG - AllenNLP v2.10.1
Load model on CPU post training to save GPU memory. ... that leads to validation occuring after the first epoch regardless of validation_start...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found