question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BART model generates drastically worse output when labels are passed to forward method

See original GitHub issue

System Info

  • transformers version: 4.21.3
  • Platform: Linux-5.15.0-33-generic-x86_64-with-glibc2.35
  • Python version: 3.10.4
  • Huggingface_hub version: 0.9.1
  • PyTorch version (GPU?): 1.11.0+cu113 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes (but same issue appears even when running on CPU)
  • Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten @patil-suraj

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

According to BART documentation adding the labels to the forward pass should result in the loss also being returned in the output Seq2SeqLMOutput object. However, this also seems to have the very bad side effect of drastically degrading output’s performance and I cannot understand why this would be the case. I’ve used an example below taken from the CNN/DailyMail dataset.

data = {"input": """March 14 is my favorite day to be a nerd. Across the country, math geeks in museums, schools, private groups and elsewhere gather to celebrate the number pi, approximately 3.14. That's why March 14 -- 3-14 -- is Pi Day. What's more, Albert Einstein was born on this day. A quick refresher: Pi is defined as the distance around a perfect circle, or the circumference, divided by the distance across it, or the diameter. It is also involved in calculating the area of a circle, the volume of a sphere, and many other mathematical formulas you might need in the sciences. Throughout history, people have been captivated by this number because there is no way to calculate it exactly by a simple division on your calculator. What's more, its digits go on infinitely, without any pattern in the numbers. 3.1415926535897932 ... etc. Even that many digits are more than most people would need for everyday use, but some folks have been inspired to memorize thousands of digits of pi, or even use the digits to create poetry or music. On Pi Day, one number 'reeks of mystery'. Math may be scary, but pi is not -- as evidenced by the widespread revelry on Pi Day. One might even say -- gasp! -- it's cool to like pi these days. Even the House of Representatives supported the designation of March 14 as National Pi Day in 2009. In countries where the day is written before the month, Friday is 14-3, which looks less like pi. "And so Pi Day is an acquired taste," mathematician Jonathan Borwein, at the University of Newcastle in Australia, said in an e-mail. Conveniently, "pi" sounds like "pie," and pies are round. You could celebrate Pi Day in a casual way by grabbing a slice of pastry, or pizza. If you're in enrolled in school, your math class or math department might be doing something special already. But if you happen to live in a particularly pi-happy place, you might be able to take part in some larger-scale, pi-inspired activities. Where Pi Day began. If you want to go where the day is said to be "invented," look no further than San Francisco's Exploratorium. Larry Shaw, who worked in the electronics group at the museum, began the tradition in 1988. Last year was Pi Day's 25th anniversary there. Pi Day began as a small gathering with mostly museum staff. Now it's a public pi extravaganza featuring a "Pi procession," whose attendees get a number -- 0 to 9 -- and line up in the order of pi's digits: 3.14159265 ... you get the idea. The parade ends at the "pi shrine" -- a pi symbol with digits spiraling around it embedded in the sidewalk, which was unveiled last year. For those who can't attend in person, the Exploratorium has a Second Life Pi Day event that includes "irrational exhibits, fireworks, cheerleaders, music, and dancing." The museum also lists a bunch of educational activities to teach about the concept of pi. On Pi Day, is 'pi' under attack? Where Einstein lived. On the opposite coast, the leafy university town where Albert Einstein spent the last 22 years of his life is showing community-wide exuberance for pi. Princeton, New Jersey, kicks off Pi Day weekend on Thursday night with a reading by physicist Charles Adler, then heads into a full day of activities on Friday, including a walking tour of Einstein's neighborhood and a pizza pie-making contest. The pie-eating contest takes place at McCaffrey's supermarket, while an Einstein look-alike competition will match mustaches and wild gray hair at the Princeton Public Library. Pi fans who have been spending the last year memorizing digits can show off and compete at the library, where the winner among 7- to 13-year-olds can take home a cool pi-hundred (That is, $314.15). The Historical Society of Princeton will have an Einstein birthday party. Tetsuya Miyamoto, inventor of the KENKEN puzzle, will speak at the library as well. """,
"output": "March 14 is my favorite day to be a nerd, because in museums, schools, private groups and elsewhere peoplegather to celebrate the number pi, approximately 3.14"}

Using the example above, run the following code

import pandas as pd
import torch
from datasets import Dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from torch.utils.data import DataLoader
train_dataset = Dataset.from_pandas(pd.DataFrame(data=[data]))

tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-cnn')
def preprocess_function(dataset):
    model_inputs = tokenizer(
        dataset["input"], max_length=1024, truncation=True, padding='do_not_pad'
    )
    # Set up the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            dataset["output"], max_length=1024, truncation=True, padding='do_not_pad'
        )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_all_data = train_dataset.map(preprocess_function, batched=True)
tokenized_all_data = tokenized_all_data.remove_columns(['input', 'output'])

train_dataloader = DataLoader(tokenized_all_data)
for sample in train_dataloader:
    input_ids = torch.tensor(sample['input_ids'])
    attention_mask = torch.tensor(sample['attention_mask'])
    labels = torch.tensor(sample['labels'])
    labels = torch.cat((labels, (-100 * torch.ones(1024 - labels.shape[0], dtype=int))), axis=0)

model = AutoModelForSeq2SeqLM.from_pretrained('facebook/bart-large-cnn')
model.eval()
with torch.no_grad():
    output_1 = model(input_ids=input_ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0), labels=labels.unsqueeze(0))
    output_2 = model(input_ids=input_ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0))

Now check the output text (assume greedy decoding for now just for quick visualisation, although better options could be used with beam search of course)

  • Output 1
print(tokenizer.decode(output_1['logits'].squeeze().argmax(axis = 1)))

MarchMarch 14 is Pi favorite day to be a nerd. says it math, schools, private groups and elsewhere, celebrateather to celebrate pi number pi. approximately 3.14.......gngngngggneralgngngngngngnggggggngngngngngnigeng.igenigenigengi.igenigenigenigenigenigenigenigenigenbergbergigenigenigengiangiangiangiangiangiangiangiangiangiangian688giangianigengiangiangiangiangiangian.giangiangiangianindaindaivalentindagiangianindaivalentificent,,,,indainda,..........,..,.,..........igenangan...angananganigen..angangigiigengiigengigigigiigengianigengiangiangiangiangiangiangiangiangiangiangiangianigiigigianigigiangigiangiangianigiigiigiigiivalentivalentivalentgiangiangiangiangiangiangiangiangianivalentivalentivalentivalentgiangiangiangiangiangiangiangiangianivalentgianivalentivalentivalentivalentivalentivalentgiangiangiangiangiangiangianivalentuperuperivalentgiangiangiangiangiangiangiangian,giangiangiangianuperiberigenigen , igenuperuper uperuperuperuperuperuperuperuperuperuperuperuperuperuper uper uper uperuperuperuperuperiberiberiberiberuperiberiberiberiberiberiberiberiberiberiberiberiberiber..gigi.........iber.giiberiberiberiberiberivalentiberiberiberiberiberbergiberiberbergbergbergbergbergbergbergbergbergbergbergbergiberiberuperuperuperiberiberiberuperuperuperuperiberuperuperiberindauperindaindaindainda,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,umberumber,umberumber,,,,,gigiumber,umberuperuper,,,,,..,,giigg,............................,,,,,,,,,,.,,,,,.,,,,,,,,,,,,,,,,,,,gigigigigi.</s>.......................................................itableitable..........itableitable...gi.....trusttrusttrusttrust.........trust.....squsqu..................squ.................................................................gi...............................,......................................................................................................................................................................................................................................................... and...............</s> and. and and and and and and and and and..- and and and and and and and and and and and and and...

  • Output 2
print(tokenizer.decode(output_2['logits'].squeeze().argmax(axis = 1)))

MarchMarch 14 is Pi favorite day to be a nerd. Pi the country, math geeks gather museums, schools, private groups and elsewhere gather to celebrate the number pi. approximately 3.14. In's why March 14 -- 3-14 -- is Pi Day.</s>'s more, Albert Einstein was born on this day.</s> quick refresher: Pi is defined as the distance around a perfect circle, or the circumference. divided by the distance across it. or the diameter.</s> is also involved in calculating the area of a circle, the volume of a sphere, and many other mathematical formulas. might need in the sciences.</s> history, people have been captivated by this number because there is no way to calculate it exactly by a simple division on your calculator.</s>'s more, its digits go on infinitely, without any pattern in the numbers.</s>.1415926535897932... etc.</s> that many digits are more than most people would need for everyday use. but some folks have been inspired to memorize thousands of digits of pi. or even use the digits to create poetry or music.</s> Pi Day, " might'reeks of mystery', Math may be scary, but pi is not -- as evidenced by the widespread revelry on Pi Day.</s> might even say -- gasp! -- it's cool to like pi these days.</s> the House of Representatives supported the designation of March 14 as National Pi Day in 2009.</s> countries where the day is written before the month, Friday is 14-3, which looks less like pi.</s>pi so Pi Day is an acquired taste," mathematician Jonathan Borwein, at the University of Newcastle in Australia, said in an e-mail.</s>veniently, "pi" sounds like "pie," and pies are round.</s> could celebrate Pi Day in a casual way by grabbing a slice of pastry, or pizza.</s> you're in enrolled in school, your math class or math department might be doing something special already.</s> if you happen to live in a particularly pi-happy place, you might be able to take part in some larger-scale, pi-inspired activities.</s> Pi Day is.</s> you want to go where the day is said to be "invented," look no further than San Francisco's Exploratorium.</s> Shaw, who worked in the electronics group at the museum, began the tradition in 1988.</s> year was Pi Day's 25th anniversary there.</s> Day began as a small gathering with mostly museum staff.</s> it's a public pi extravaganza featuring a "Pi procession," whose attendees get a number -- 0 to 9 -- and line up in the order of pi's digits: 3.14159265... you get the idea.</s> parade ends at the "pi shrine" -- a pi symbol with digits spiraling around it embedded in the sidewalk, which was unveiled last year.</s> those who can't attend in person, the Exploratorium has a Second Life Pi Day event that includes "irrational exhibits, fireworks, cheerleaders, music, and dancing"</s> winner also lists a bunch of educational activities to teach about the concept of pi.</s> Pi Day, the 'pi' under attack?</s> Einstein spent.</s> the opposite coast, the leafy university town where Albert Einstein spent the last 22 years of his life is showing community-wide exuberance for pi.</s>, New Jersey, kicks off Pi Day weekend on Thursday night with a reading by physicist Charles Adler. then heads into a full day of activities on Friday. including a pizza tour of Einstein's neighborhood. a pizza pie-making contest.</s> winner-eating contest takes place at McCaffrey's supermarket. while an Einstein look-alike competition will match mustaches and wild gray hair at the Princeton Public Library.</s> fans who have been spending the last year memorizing digits can show off and compete at the library, where the winner among 7- to 13-year-olds can take home a cool pi-hundred (That is, $314.15).</s> Historical Society of Princeton will have an Einstein birthday party.</s>etsuya Miyamoto, inventor of the KENKEN puzzle, will speak at the library, well.</s>

Expected behavior

Output 1 and Output 2 above are completely different and I would expect them to be the same. Output 1 is by far much worse than Output 2. Why does passing or not passing the labels (see snippet below taken from code above)

with torch.no_grad():
    output_1 = model(input_ids=input_ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0), labels=labels.unsqueeze(0))
    output_2 = model(input_ids=input_ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0))

causes such a huge difference in the generated logits, when I would expect the only difference being that the loss is returned when passing the labels?

On a side note, the output 2 also contain a huge amount of end of sequence tokens </s>. Are we supposed to ignore everything after the first </s> has been generated?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
AndreaSottanacommented, Oct 10, 2022

Hi @ArthurZucker Thanks so much for providing an interim solution, I can confirm your fix works fine, and returns the correct loss I was looking for. Glad that we uncovered this. Thanks also for confirming how to skip </s> tokens. Feel free to close the issue or keep it open for future work at your discretion, in the meantime I will use the solution you provided.

0reactions
ArthurZuckercommented, Nov 3, 2022

Closing this as it makes more sens to keep the current behavior

Read more comments on GitHub >

github_iconTop Results From Across the Web

What can cause model.generate (BART) output to be ...
For Bart, it's eos . This means the decoder first takes the decoder_start_token_id and produces the first token in the labels .
Read more >
BART: Denoising Sequence-to-Sequence Pre-training for ...
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an.
Read more >
Parallel Refinements for Lexically Constrained Text Generation ...
To address these challenges, we propose Constrained BART (CBART) for lexically constrained text generation. CBART leverages the pre-trained model BART and ...
Read more >
Hugging Face Transformers Package – What Is It and How ...
These models are large and very expensive to train, ... where the model is comfortable with large input and can generate lengthy output....
Read more >
Automated Metrics for Evaluating Text Style Transfer
to our text style transfer model to which it produces an output of “He is a ... A common automated method for evaluating...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found