Finue-tuning T5 model
See original GitHub issueHi, I want to fine-tune T5 for a seq2seq task and I’m using the T5ForConditionalGeneration as it seems to have an LM decoder on top. As there’s no code example for this, I have lots of questions:
- Am I doing the right thing?
- I’m using the Adam optimizer. Is it ok?
- I’m a bit confused about the
forward
inputs in the training phase. I read this explanation over and over again and I don’t understand whether I should just useinput_ids
andlm_labels
for the training or not. Also somewhere in this issue someone’s mentioned that:
T5 input sequence should be formatted with [CLS] and [SEP] tokens
So which one is right? I’m super confused.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:33 (21 by maintainers)
Top Results From Across the Web
A Full Guide to Finetuning T5 for Text2Text and Building a ...
In this article, we see a complete example of fine-tuning of T5 for generating candidate titles for articles. The model is fine-tuned ......
Read more >Fine Tuning T5 Transformer Model with PyTorch
A T5 is an encoder-decoder model. It converts all NLP problems like language translation, summarization, text generation, question-answering, to ...
Read more >Fine Tuning a T5 transformer for any Summarization Task
The T5 tuner is a pytorch lightning class that defines the data loaders, forward pass through the model, training one step, validation on...
Read more >Top 3 Fine-Tuned T5 Transformer Models - Vennify.ai
In this article I'll discuss my top three favourite fine-tuned T5 models that are available on Hugging Face's Model Hub. T5 was published...
Read more >mrm8488/t5-base-finetuned-break_data - Hugging Face
The T5 model was presented in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel, Noam Shazeer, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @amitness,
For T5 summarization you will have to append the prefix "summarize: " to every input data. But you are more or less right. All you have to do is:
There is no need to shift the tokens as you show at the end of your comment because T5 does that automatically - see https://github.com/huggingface/transformers/blob/6af3306a1da0322f58861b1fbb62ce5223d97b8a/src/transformers/modeling_t5.py#L1063.
This is also explained in https://huggingface.co/transformers/model_doc/t5.html#training .
@amitness
E.g. in your summarization case, it would look something like:
Do note that
T5ForConditionalGeneration
already prepends the padding by default. Above is only necessary if you’re doing a forward pass straight fromT5Model
.Regarding your question about making your own prefix, yes, you should be able to train on your own prefix. This is the whole point of T5’s text-to-text approach. You should be able to specify any problem through this kind of approach (e.g. Appendix D in the T5 paper).