Weird summarization results - the summary is longer than the input
See original GitHub issue🐛 Bug
Information
Summarization task is returning an unexpected results. For an input of
“We have a telephony partner who is very interested in this program and may be able to help identify pilot customers.”
The results is
[{‘summary_text’: ‘“We have a telephony partner who is very interested in this program and may be able to help identify pilot customers,” the company says. “We are looking at a number of different ways to get people talking to each other,” it adds. “It's a very exciting time for us,” says the company's chief operating officer.’}]
Model I am using (Bert, XLNet …): Summarization pipeline
Language I am using the model on (English, Chinese …): Eng
The problem arises when using:
- the official example scripts: (give details below)
- [V ] my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- [V ] my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- Execute below script
!pip install -q transformers --upgrade
from transformers import pipeline
summarizer = pipeline(task="summarization")
data = "We have a telephony partner who is very interested in this program and may be able to help identify pilot customers."
print(summarizer(data))
Expected behavior
Would expect the summary to 1) not add contextual information that doesn’t exist, and 2) to not be longer than the input. Arguably the input is short but still…
Environment info
Colab
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
As a side request, it would be awesome to have metrics associated with each models that are part of transformers to help users choose the right one for their job (cc: @julien-c ).
Unfortunately, Bart can only process 1024 tokens at once, so your best best would be to split your doc into chunks, summarize each one, and concatenate the summaries.