question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent padding behavior for decoder_input_ids for Seq2Seq models

See original GitHub issue

System Info

transformers : 4.18.0 torch: 1.12.0 Python 3.7.13

Who can help?

@patrickvonplaten @patil-suraj

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

models = [
    "t5-small",
    "google/mt5-small",
    "facebook/m2m100_418M",
    "facebook/wmt19-ru-en",
    "facebook/bart-base",
    "facebook/blenderbot-400M-distill",
    "google/bigbird-pegasus-large-arxiv",
    "allenai/led-base-16384",
    "microsoft/prophetnet-large-uncased"
]

for model_name in models: 

    # load the seq2seq model
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    # tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.padding_side = "left"

    # sample sentence
    sample_sentence = "generate some numbers"
    encodings = tokenizer(sample_sentence, 
                        padding="max_length",
                        max_length=5,
                        return_tensors="pt",
                        return_attention_mask=True,
                        truncation=True)

    # decoder input ids (with a default start token for the model)
    decoder_input_ids = torch.ones(1,1, dtype=torch.int32) * model.config.decoder_start_token_id

    # model's forward without any padding for decoder_input_ids (hence without decoder_attn mask)
    outputs = model.forward(input_ids=encodings.input_ids,
                            attention_mask=encodings.attention_mask,
                            decoder_input_ids=decoder_input_ids,
                            return_dict=True)
    next_token_logits = outputs["logits"][:,-1, :]


    # same decoder input ids but padded  + decoder attention mask
    decoder_input_ids_with_padding = torch.ones(1,3, dtype=torch.int32) * tokenizer.pad_token_id
    decoder_input_ids_with_padding[:,-1] = model.config.decoder_start_token_id
    decoder_attn_mask = torch.zeros(1,3)
    decoder_attn_mask[:,-1] = 1

    # model's forward with padding for decoder_input_ids (hence with decoder_attn mask)
    outputs_with_padding = model.forward(input_ids=encodings.input_ids,
                                        attention_mask=encodings.attention_mask,
                                        decoder_input_ids=decoder_input_ids_with_padding,
                                        decoder_attention_mask=decoder_attn_mask,
                                        return_dict=True)
    next_token_logits_with_padding = outputs_with_padding["logits"][:,-1,:]
    

    # check if padding affects the logits
    if torch.allclose(next_token_logits, next_token_logits_with_padding, atol=1e-3):
        print(f"No issues with model: {model_name}")
    else:
        print(f"Issues with model: {model_name}")

Expected behavior

This issue is regarding seq2seq models for conditional text generation.

There are differences in the output logits when padding is used for decoder_input_ids (by passing also decoder_attention_mask). This issue exists only for a few models (eg: BART, BlendorBot, Pegasus etc) and for other models there are no output differences (eg: T5, MT5 etc). Hence there is no consistency in the output across diff seq2seq models.

To reproduce these differences, run the provided script which does the following:

  • Do one forward pass for a sample prompt (input_ids, attention_mask), additionally passing the default start token for the decoder.
  • Do another forward pass for the prompt (same input_ids and attention_mask). But this time, decoder_input_ids is left padded to a seq length of 3 with the same default start token as the last token. Additionally, decoder_attention_mask is passed to avoid attending to padded tokens.
  • Last token logits from these two forward passes are compared for equivalence (with a tolerance of 1e-3)

And this is done for several seq2seq models to see which models have these differences.

Ideally, we would expect padding not to cause any such differences.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
ArthurZuckercommented, Oct 17, 2022

Hey! 🙌 it’s on my to do list, but can’t look at it right now so feel free to do so 😀🤗

1reaction
sguggercommented, Oct 14, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

MarianMT - Hugging Face
The language codes used to name models are inconsistent. Two digit codes can usually be found here, three digit codes require googling “language...
Read more >
Pytorch inconsistent size with pad_packed_sequence, seq2seq
I'm having some inconsistencies with the output of a encoder I got from this github . The encoder looks as follows: class Encoder(nn....
Read more >
Deep Reinforcement Learning for Sequence-to ... - PubMed
However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel ...
Read more >
10.7. Encoder-Decoder Seq2Seq for Machine Translation
10.7.1 illustrates how to use two RNNs for sequence to sequence learning in ... We will train this model for machine translation on...
Read more >
How to Develop a Seq2Seq Model for Neural Machine ...
Machine Translation Data · Input Sequences: Padded to a maximum length of 16 characters with a vocabulary of 71 different characters (10000, 16, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found