question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT2-large for sequence classification default num_labels differs from the default for GPT2-small and GPT2-medium

See original GitHub issue

Environment info

  • transformers version: 4.5.0
  • Platform: Linux-5.4.0-74-generic-x86_64-with-glibc2.29
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.8.1+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Who can help

Models:

Information

When creating an AutoModelForSequenceClassification using from_pretrained if you pass in gpt2 as the model name then you receive a classifier with two targets (model.config.num_labels = 2). If you instead pass in gpt2-large as the model name then you receive a regressor with one target (model.config.num_labels = 1).

Model I am using: GPT-2

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: The Stanford Sentiment Treebank
  • my own task or dataset: (give details below) (I found this issue when working on sst2 but it is not particularly relevant to the issue).

To reproduce

Steps to reproduce the behavior:

  1. Run this code:
from transformers import AutoModelForSequenceClassification

gpt2_small_features = AutoModelForSequenceClassification.from_pretrained("gpt2").score.out_features
gpt2_large_features = AutoModelForSequenceClassification.from_pretrained("gpt2-large").score.out_features

print([gpt2_small_features, gpt2_large_features])

This prints [2, 1].

Expected behavior

num_labels should have a consistent default across different versions of gpt2. The source code for PretrainedConfig suggests that this should be 2.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
LysandreJikcommented, Sep 8, 2021

This is fixed for both gpt2-large and gpt2-xl

0reactions
matthewfranglencommented, Sep 4, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenAI GPT2 - Hugging Face
A blog on How to generate text: using different decoding methods for language ... vocab_size ( int , optional, defaults to 50257) —...
Read more >
transformers/configuration_gpt2.py at main · huggingface ...
configuration with the defaults will yield a similar configuration to that of the GPT-2. [gpt2](https://huggingface.co/gpt2) architecture.
Read more >
Text generation with GPT-2 - Model Differently
In this post we will see how to generate text with models based on the Transformers architecture, and we will use this knowledge...
Read more >
GPT-2 Large –774M– w/Pytorch: Not that impressive | Kaggle
In this notebook we will apply the out-of-the-box GPT-2 models ( gpt2 , gpt2-medium and the recently-released and ported gpt2-large ) o the...
Read more >
GPT2 For Text Classification Using Hugging Face Transformers
Will use cpu by default if no gpu found. model_name_or_path – Name of transformers model – will use already pretrained model. Path of ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found