Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MT5ForConditionalGeneration has model.config.max_length=20 by default. Why?

See original GitHub issue

transformers version: 4.6.1
Platform: Ubuntu 18
Python version: 3.6

I spent one week training a T5 model with this package and couldn’t figure out why my sequences obtained with Trainer.evaluate were only yielding a maximum of 20 tokens. I sent the max_length argument to the tokenizer to encode the input/output. After a long time I found out that this happens:

model = MT5ForConditionalGeneration.from_pretrained('google/mt5-small')
model.config.max_length
Out: 20

The generate method was being used in Trainer because I used predict_with_generate=True. Please change this behaviour, this was a very hard bug to find. model.config.max_length should be set to None by default, if the model does not have limitations.

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:9 (8 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Apr 12, 2022

Sadly we cannot change this default anymore due to backward compatibility. Always having the model generate up to maximum allowed tokens can also be tricky - multiple models will always error out due to memory, some models like T5 have no max length really, … so think we’ll have to leave it at 20. Maybe we can improve the docs somehow

1reaction

patrickvonplatencommented, Apr 13, 2022

People that are familiar with generate() should know that max_length can and should be overwritten. I’ll try to make the docs better here, but I don’t think we should add a warning as this will literally be shown everytime someone calls generate without defining max_length

Top Results From Across the Web

mT5 - Hugging Face

It is used to instantiate a mT5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults...