MT5ForConditionalGeneration has model.config.max_length=20 by default. Why?
See original GitHub issuetransformers
version: 4.6.1- Platform: Ubuntu 18
- Python version: 3.6
I spent one week training a T5 model with this package and couldn’t figure out why my sequences obtained with Trainer.evaluate were only yielding a maximum of 20 tokens. I sent the max_length
argument to the tokenizer to encode the input/output.
After a long time I found out that this happens:
model = MT5ForConditionalGeneration.from_pretrained('google/mt5-small')
model.config.max_length
Out: 20
The generate method was being used in Trainer because I used predict_with_generate=True
.
Please change this behaviour, this was a very hard bug to find. model.config.max_length
should be set to None
by default, if the model does not have limitations.
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:9 (8 by maintainers)
Top Results From Across the Web
mT5 - Hugging Face
It is used to instantiate a mT5 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults...
Read more >PyTorch-Transformers
The configuration is optional. The configuration object holds information concerning the model, such as the number of heads/layers, if the model should output ......
Read more >Asking to truncate to max_length but no maximum length is ...
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sadly we cannot change this default anymore due to backward compatibility. Always having the model generate up to maximum allowed tokens can also be tricky - multiple models will always error out due to memory, some models like T5 have no max length really, … so think we’ll have to leave it at 20. Maybe we can improve the docs somehow
People that are familiar with
generate()
should know thatmax_length
can and should be overwritten. I’ll try to make the docs better here, but I don’t think we should add a warning as this will literally be shown everytime someone calls generate without definingmax_length