Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT2 (pre-trained not fine-tuned) only generates additional special tokens

See original GitHub issue

Environment info

transformers version: 3.5.0
Platform: Darwin-19.6.0-x86_64-i386-64bit
Python version: 3.6.3
PyTorch version (GPU?): 1.7.0 (False)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@patrickvonplaten

Information

Model I am using (GPT2 / DistilGPT2):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

I’m using GPT2 or DistilGPT2 on MetalWOZ and the issue I’m having is when I add special tokens (even bos, eos, etc) and prompt the model, it only generates those special tokens - no other token. For example, if I add the tokens <USER> and <SYSTEM> and prompt the model with:

“I want a pepperoni pizza with mushroom”

I get:

“I want a pepperoni pizza with mushroom <USER> <USER> <USER> <SYSTEM> <USER> <USER> <USER> <SYSTEM> <USER> <USER>”

To reproduce

Steps to reproduce the behavior:

Add special tokens to a GPT2 model (example below with distilgpt2 but I get the same behavior with gpt2)
Resize embeddings
Prompt model

import torch
import torch.nn.functional as F
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
tokenizer.add_special_tokens(
	{'additional_special_tokens': ['<USER>', '<SYSTEM>']}
)

model = GPT2LMHeadModel.from_pretrained('distilgpt2')
model.resize_token_embeddings(len(tokenizer))
inp_tok_ids = tokenizer.encode('I want a pepperoni pizza with mushroom')
inp_tensor = torch.LongTensor(inp_tok_ids).unsqueeze(0)
model.eval()

with torch.no_grad():
	for i in range(10):
		outputs = model(inp_tensor)
		logits = outputs[0][:, -1, :]
		probs = F.softmax(logits, dim=-1)
		next_token = torch.multinomial(probs, num_samples=1).squeeze(1)
		inp_tensor = torch.cat([inp_tensor, next_token.unsqueeze(-1)], dim=-1)

print(tokenizer.decode(inp_tensor[0]))

Expected behavior

I would expect a mix of the new special tokens and other tokens.

Issue Analytics

State:
Created 3 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

3reactions

al3xpapangeliscommented, Nov 14, 2020

Thanks @LysandreJik and @patrickvonplaten! I like @g-karthik suggestion, it would be nice for this bevahiour to happen automatically

0reactions

g-karthikcommented, Nov 13, 2020

@patrickvonplaten yes, I was thinking I’ll try and estimate the mean and covariance of the set of values in GPT-2’s pre-trained embeddings (across all of its 4 model sizes), assuming a Gaussian distribution. And then update the random initialization’s mean and std. dev. accordingly in the model’s _init_weights(). That way, the random initialization comes from a distribution that’s effectively “similar” to that of the pre-trained vectors, and hence decoding sequences would result in a mixture of the original tokens and added tokens.