Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attention masks are ignored when using model.generate() in batch setting for GPT-2

See original GitHub issue

Environment info

transformers version: ‘3.3.1’ and ‘2.1.0’ (Tested on both)
Platform: Linux Azure VM
Python version: 3.6.8
PyTorch version (GPU?): 1.3.0 (Yes)
Tensorflow version (GPU?): N/A
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik @TevenLeScao

Information

Model I am using (Bert, XLNet …): GPT-2

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

import argparse
import logging
import os
import sys
import time
sys.path.append('transformers/src')
import numpy as np
import torch
import csv
import copy

from transformers import (
	GPT2LMHeadModel,
	GPT2Tokenizer
)

from multiprocessing import Pool, cpu_count
from tqdm import tqdm

MODEL_CLASSES = {
	"gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
}

def set_seed():
	np.random.seed(42)
	torch.manual_seed(42)
	torch.cuda.manual_seed_all(42)

def generate_sequences_parallel(model, tokenizer, orig_prompt_list):
	set_seed()
	proc_cnt = cpu_count() - 2
	prompt_list = copy.deepcopy(orig_prompt_list)

	max_seq_len = 128

	requires_preprocessing = False
	if not requires_preprocessing:
		# GPT-2 doesn't require prepocess so we don't need to parallelize that

		inputs = tokenizer(orig_prompt_list, add_special_tokens=False, return_tensors="pt", padding=True)

		input_ids = inputs["input_ids"]
		attn_masks = inputs["attention_mask"]

		max_len_input_ids = max([len(input_id) for input_id in input_ids])

	input_ids = input_ids.to('cuda')
	attn_masks = attn_masks.to('cuda')

	output_sequences = model.generate(
		input_ids=input_ids,
		max_length=10 + max_len_input_ids,
		temperature=1.0,
		top_k=0,
		top_p=0.9,
		repetition_penalty=1.0,
		do_sample=True,
		num_return_sequences=1,
		attention_mask=attn_masks
	)

	return output_sequences

prompt_list_single = [['Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was'], ['What do you all do to make it a great day and my mood was']]
prompt_list_batch = ['Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was', 'What do you all do to make it a great day and my mood was']

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
model.to('cuda')
tokenizer.padding_side = "left"

# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id


single = []
for elem in prompt_list_single:
	single.append(generate_sequences_parallel(model, tokenizer, elem))

print('BATCH')
print()

batch = generate_sequences_parallel(model, tokenizer, prompt_list_batch)

assert(torch.eq(single[0],batch[0]))
assert(torch.eq(single[1],batch[1]))

Expected behavior

I expect the results of this script with batch size 1 to be the size as batch size 2 but it just ignores all the generated attention_ masks and position_ids. I’ve looked at #3021 and #3167 but those don’t seem to offer a concrete solution. Is there some way to use GPT-2’s batch generation?

Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

rohit497commented, Oct 19, 2020

On further investigation, I found that if do_sample is set to False, the batch generation works as expected but it fails with sampling. For my project, I’m trying to get diverse sentences from gpt2 using the same prompt, so sampling is very important. Is there a fix on the way for when do_sample = True?

0reactions

patrickvonplatencommented, Oct 21, 2020

Hey @rohit497,

I checked your sample and the code seems to work fine! Here to reproduce my results:

#!/usr/bin/env python3
import torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer

MODEL_CLASSES = {
    "gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
}


def set_seed():
    torch.manual_seed(42)


def generate_sequences_parallel(model, tokenizer, orig_prompt_list):

    set_seed()
    inputs = tokenizer(
        orig_prompt_list, add_special_tokens=False, return_tensors="pt", padding=True
    )

    input_ids = inputs["input_ids"]
    attn_masks = inputs["attention_mask"]

    max_len_input_ids = max([len(input_id) for input_id in input_ids])

    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=10 + max_len_input_ids,
        temperature=1.0,
        top_k=0,
        top_p=0.9,
        repetition_penalty=1.0,
        do_sample=True,
        num_return_sequences=1,
        attention_mask=attn_masks,
    )

    return output_sequences


prompt_list_single = [
    [
        "Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was"
    ],
    ["What do you all do to make it a great day and my mood was"],
]
prompt_list_batch = [
    "Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was",
    "What do you all do to make it a great day and my mood was",
]

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer.padding_side = "left"

# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id


single = []
for elem in prompt_list_single:
    single.append(generate_sequences_parallel(model, tokenizer, elem))

print("BATCH")
print()

batch = generate_sequences_parallel(model, tokenizer, prompt_list_batch)

print(tokenizer.batch_decode(batch, skip_special_tokens=True))

The outputs look good so I think the attention_mask is correctly applied and batch generation works.

The reason that you the results are not identical is becasue you sample from two different distributions. When you pass a single example the softmax output has batch_size=1 while when you use a batch the softmax output has batch_size=2 dimension. That means that the first time you sample from a (1, vocab_size) distribution whereas the second time you sample from a (2, vocab_size) distribution. Now while each part of (2, vocab_size) is the same as for the single batch passes, the sampled output can differ because torch.multinomial does not yield the same results IMO (maybe you can check that actually). I adapted the test slightly for which there was a torch.manual_seed() for some reason which might be misleading. The test only checks for argmax as this is deterministic.

Hope this helps.

Top Results From Across the Web

Generation - Hugging Face

Loading and using a generation configuration file does not change a model configuration or weights. It only affects the model's behavior at generation...

padding and attention mask does not work as intended in ...

The Transformers library provides the encode_plus() and batch_encode_plus() ... generate the attention masks, and do padding for you.

What Are Attention Masks? - Luke Salamone's Blog

TLDR: Attention masks allow us to send a batch into the transformer even when the examples in the batch have varying lengths.

Fine-tune a German GPT-2 Model with Tensorflow ... - Data Dive

Here, we move to an exciting new area: text generation with neural networks. Because our data set is not only extensive but also...

GPT2 Text Generation | Kaggle

We will be using the GPT-2 tokenizer to tokenize our flavor text data. ... time.time() total_train_loss = 0 model.train() for step, batch in ......