question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attention masks are ignored when using model.generate() in batch setting for GPT-2

See original GitHub issue

Environment info

  • transformers version: ‘3.3.1’ and ‘2.1.0’ (Tested on both)
  • Platform: Linux Azure VM
  • Python version: 3.6.8
  • PyTorch version (GPU?): 1.3.0 (Yes)
  • Tensorflow version (GPU?): N/A
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik @TevenLeScao

Information

Model I am using (Bert, XLNet …): GPT-2

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

import argparse
import logging
import os
import sys
import time
sys.path.append('transformers/src')
import numpy as np
import torch
import csv
import copy

from transformers import (
	GPT2LMHeadModel,
	GPT2Tokenizer
)

from multiprocessing import Pool, cpu_count
from tqdm import tqdm

MODEL_CLASSES = {
	"gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
}

def set_seed():
	np.random.seed(42)
	torch.manual_seed(42)
	torch.cuda.manual_seed_all(42)

def generate_sequences_parallel(model, tokenizer, orig_prompt_list):
	set_seed()
	proc_cnt = cpu_count() - 2
	prompt_list = copy.deepcopy(orig_prompt_list)

	max_seq_len = 128

	requires_preprocessing = False
	if not requires_preprocessing:
		# GPT-2 doesn't require prepocess so we don't need to parallelize that

		inputs = tokenizer(orig_prompt_list, add_special_tokens=False, return_tensors="pt", padding=True)

		input_ids = inputs["input_ids"]
		attn_masks = inputs["attention_mask"]

		max_len_input_ids = max([len(input_id) for input_id in input_ids])

	input_ids = input_ids.to('cuda')
	attn_masks = attn_masks.to('cuda')

	output_sequences = model.generate(
		input_ids=input_ids,
		max_length=10 + max_len_input_ids,
		temperature=1.0,
		top_k=0,
		top_p=0.9,
		repetition_penalty=1.0,
		do_sample=True,
		num_return_sequences=1,
		attention_mask=attn_masks
	)

	return output_sequences

prompt_list_single = [['Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was'], ['What do you all do to make it a great day and my mood was']]
prompt_list_batch = ['Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was', 'What do you all do to make it a great day and my mood was']

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
model.to('cuda')
tokenizer.padding_side = "left"

# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id


single = []
for elem in prompt_list_single:
	single.append(generate_sequences_parallel(model, tokenizer, elem))

print('BATCH')
print()

batch = generate_sequences_parallel(model, tokenizer, prompt_list_batch)

assert(torch.eq(single[0],batch[0]))
assert(torch.eq(single[1],batch[1]))

Expected behavior

I expect the results of this script with batch size 1 to be the size as batch size 2 but it just ignores all the generated attention_ masks and position_ids. I’ve looked at #3021 and #3167 but those don’t seem to offer a concrete solution. Is there some way to use GPT-2’s batch generation?

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
rohit497commented, Oct 19, 2020

On further investigation, I found that if do_sample is set to False, the batch generation works as expected but it fails with sampling. For my project, I’m trying to get diverse sentences from gpt2 using the same prompt, so sampling is very important. Is there a fix on the way for when do_sample = True?

0reactions
patrickvonplatencommented, Oct 21, 2020

Hey @rohit497,

I checked your sample and the code seems to work fine! Here to reproduce my results:

#!/usr/bin/env python3
import torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer

MODEL_CLASSES = {
    "gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
}


def set_seed():
    torch.manual_seed(42)


def generate_sequences_parallel(model, tokenizer, orig_prompt_list):

    set_seed()
    inputs = tokenizer(
        orig_prompt_list, add_special_tokens=False, return_tensors="pt", padding=True
    )

    input_ids = inputs["input_ids"]
    attn_masks = inputs["attention_mask"]

    max_len_input_ids = max([len(input_id) for input_id in input_ids])

    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=10 + max_len_input_ids,
        temperature=1.0,
        top_k=0,
        top_p=0.9,
        repetition_penalty=1.0,
        do_sample=True,
        num_return_sequences=1,
        attention_mask=attn_masks,
    )

    return output_sequences


prompt_list_single = [
    [
        "Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was"
    ],
    ["What do you all do to make it a great day and my mood was"],
]
prompt_list_batch = [
    "Good Morning Who is up with the sun Starting my morning routine with some Yoga and my mood was",
    "What do you all do to make it a great day and my mood was",
]

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer.padding_side = "left"

# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id


single = []
for elem in prompt_list_single:
    single.append(generate_sequences_parallel(model, tokenizer, elem))

print("BATCH")
print()

batch = generate_sequences_parallel(model, tokenizer, prompt_list_batch)

print(tokenizer.batch_decode(batch, skip_special_tokens=True))

The outputs look good so I think the attention_mask is correctly applied and batch generation works.

The reason that you the results are not identical is becasue you sample from two different distributions. When you pass a single example the softmax output has batch_size=1 while when you use a batch the softmax output has batch_size=2 dimension. That means that the first time you sample from a (1, vocab_size) distribution whereas the second time you sample from a (2, vocab_size) distribution. Now while each part of (2, vocab_size) is the same as for the single batch passes, the sampled output can differ because torch.multinomial does not yield the same results IMO (maybe you can check that actually). I adapted the test slightly for which there was a torch.manual_seed() for some reason which might be misleading. The test only checks for argmax as this is deterministic.

Hope this helps.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generation - Hugging Face
Loading and using a generation configuration file does not change a model configuration or weights. It only affects the model's behavior at generation...
Read more >
padding and attention mask does not work as intended in ...
The Transformers library provides the encode_plus() and batch_encode_plus() ... generate the attention masks, and do padding for you.
Read more >
What Are Attention Masks? - Luke Salamone's Blog
TLDR: Attention masks allow us to send a batch into the transformer even when the examples in the batch have varying lengths.
Read more >
Fine-tune a German GPT-2 Model with Tensorflow ... - Data Dive
Here, we move to an exciting new area: text generation with neural networks. Because our data set is not only extensive but also...
Read more >
GPT2 Text Generation | Kaggle
We will be using the GPT-2 tokenizer to tokenize our flavor text data. ... time.time() total_train_loss = 0 model.train() for step, batch in ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found