GPT2-large for sequence classification default num_labels differs from the default for GPT2-small and GPT2-medium
See original GitHub issueEnvironment info
transformers
version: 4.5.0- Platform: Linux-5.4.0-74-generic-x86_64-with-glibc2.29
- Python version: 3.8.5
- PyTorch version (GPU?): 1.8.1+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help
Models:
- gpt2: @patrickvonplaten, @LysandreJik
Information
When creating an AutoModelForSequenceClassification
using from_pretrained
if you pass in gpt2
as the model name then you receive a classifier with two targets (model.config.num_labels
= 2). If you instead pass in gpt2-large
as the model name then you receive a regressor with one target (model.config.num_labels
= 1).
Model I am using: GPT-2
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: The Stanford Sentiment Treebank
- my own task or dataset: (give details below) (I found this issue when working on sst2 but it is not particularly relevant to the issue).
To reproduce
Steps to reproduce the behavior:
- Run this code:
from transformers import AutoModelForSequenceClassification
gpt2_small_features = AutoModelForSequenceClassification.from_pretrained("gpt2").score.out_features
gpt2_large_features = AutoModelForSequenceClassification.from_pretrained("gpt2-large").score.out_features
print([gpt2_small_features, gpt2_large_features])
This prints [2, 1]
.
Expected behavior
num_labels
should have a consistent default across different versions of gpt2. The source code for PretrainedConfig suggests that this should be 2.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
OpenAI GPT2 - Hugging Face
A blog on How to generate text: using different decoding methods for language ... vocab_size ( int , optional, defaults to 50257) —...
Read more >transformers/configuration_gpt2.py at main · huggingface ...
configuration with the defaults will yield a similar configuration to that of the GPT-2. [gpt2](https://huggingface.co/gpt2) architecture.
Read more >Text generation with GPT-2 - Model Differently
In this post we will see how to generate text with models based on the Transformers architecture, and we will use this knowledge...
Read more >GPT-2 Large –774M– w/Pytorch: Not that impressive | Kaggle
In this notebook we will apply the out-of-the-box GPT-2 models ( gpt2 , gpt2-medium and the recently-released and ported gpt2-large ) o the...
Read more >GPT2 For Text Classification Using Hugging Face Transformers
Will use cpu by default if no gpu found. model_name_or_path – Name of transformers model – will use already pretrained model. Path of ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is fixed for both
gpt2-large
andgpt2-xl
https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-config.json this still has
_num_labels
of 1 where https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json lacks the entry and so inherits the default value.