question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Summarization length not controlled by max_length, min_length

See original GitHub issue

I am using the pertained ctrlsum-cnndm model from transformers. I noticed that summarization text length is not exactly controlled by max_length, min_length arguments of model.generate(). Not sure why. It appears that empty spaces are included, but not sure. Please help. Thanks.

text1="The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("hyunwoongko/ctrlsum-cnndm")
model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-cnndm")

inputs = tokenizer.encode(text1, return_tensors="pt", max_length=1024)#16
outputs = model.generate(inputs, max_length=100, min_length=50, num_beams=5, early_stopping=True)
print(tokenizer.decode(outputs[0]))

Results: max_length=100, min_length=50, actually 36 words </s> The Eiffel Tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. It is the tallest structure in Paris and the second tallest free-standing structure in France after the Millau Viaduct.</s>

max_length=200, min_length=100, actually 83 words </s> The Eiffel Tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. It was the tallest man-made structure in the world for 41 years until the Chrysler Building in New York City was finished in 1930. It is the second tallest free-standing structure in France after the Millau Viaduct, which measures 125 metres (410 ft) on each side. The tower is now taller than the Chrysler building by 5.2 metres (17 ft)</s>

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
LysandreJikcommented, Jun 29, 2021
1reaction
chris-aeviatorcommented, Jun 28, 2021

Stalebots are so much an anti-quality measure and have not been fixed

Read more comments on GitHub >

github_iconTop Results From Across the Web

Summarization pipeline max_length parameter seems to just ...
Summarization pipeline max_length parameter seems to just cut the summary rather than generating a complete sentence within the max length # ...
Read more >
HTML attribute: minlength - HTML: HyperText Markup Language
The minlength attribute defines the minimum number of characters (as UTF-16 code units) the user can enter into an <input> or <textarea> ....
Read more >
Summarization - Hugging Face
Use the keyword text_target argument when tokenizing labels. Truncate sequences to be no longer than the maximum length set by the max_length parameter....
Read more >
c# - MaxLength Attribute not generating client-side validation ...
@JeremyHolovacs, if you don't want to specify a max you simply specify int.MaxValue as max length and then set the MinLength property to ......
Read more >
JavaScript : Html form - Restricting the Length - w3resource
Checking string length · Javascript function to restrict length of user input function lengthRange(inputtxt, minlength, maxlength) { var ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found