Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT2-large for sequence classification default num_labels differs from the default for GPT2-small and GPT2-medium

See original GitHub issue

Environment info

transformers version: 4.5.0
Platform: Linux-5.4.0-74-generic-x86_64-with-glibc2.29
Python version: 3.8.5
PyTorch version (GPU?): 1.8.1+cu102 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help

Models:

gpt2: @patrickvonplaten, @LysandreJik

Information

When creating an AutoModelForSequenceClassification using from_pretrained if you pass in gpt2 as the model name then you receive a classifier with two targets (model.config.num_labels = 2). If you instead pass in gpt2-large as the model name then you receive a regressor with one target (model.config.num_labels = 1).

Model I am using: GPT-2

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: The Stanford Sentiment Treebank
my own task or dataset: (give details below) (I found this issue when working on sst2 but it is not particularly relevant to the issue).

To reproduce

Steps to reproduce the behavior:

Run this code:

from transformers import AutoModelForSequenceClassification

gpt2_small_features = AutoModelForSequenceClassification.from_pretrained("gpt2").score.out_features
gpt2_large_features = AutoModelForSequenceClassification.from_pretrained("gpt2-large").score.out_features

print([gpt2_small_features, gpt2_large_features])

This prints [2, 1].

Expected behavior

num_labels should have a consistent default across different versions of gpt2. The source code for PretrainedConfig suggests that this should be 2.

Issue Analytics

State:
Created 2 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Sep 8, 2021

This is fixed for both gpt2-large and gpt2-xl

0reactions

matthewfranglencommented, Sep 4, 2021

https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-config.json this still has _num_labels of 1 where https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json lacks the entry and so inherits the default value.

Read more comments on GitHub >

Top Results From Across the Web

OpenAI GPT2 - Hugging Face

A blog on How to generate text: using different decoding methods for language ... vocab_size ( int , optional, defaults to 50257) —...

transformers/configuration_gpt2.py at main · huggingface ...

configuration with the defaults will yield a similar configuration to that of the GPT-2. [gpt2](https://huggingface.co/gpt2) architecture.

Text generation with GPT-2 - Model Differently

In this post we will see how to generate text with models based on the Transformers architecture, and we will use this knowledge...

GPT-2 Large –774M– w/Pytorch: Not that impressive | Kaggle

In this notebook we will apply the out-of-the-box GPT-2 models ( gpt2 , gpt2-medium and the recently-released and ported gpt2-large ) o the...

GPT2 For Text Classification Using Hugging Face Transformers

Will use cpu by default if no gpu found. model_name_or_path – Name of transformers model – will use already pretrained model. Path of ......

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[Deepspeed][initialization] pegasus: unable to load/init the weights

Issue in layer-drop implementation in TensorFlow models in graph mode