question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Finue-tuning T5 model

See original GitHub issue

Hi, I want to fine-tune T5 for a seq2seq task and I’m using the T5ForConditionalGeneration as it seems to have an LM decoder on top. As there’s no code example for this, I have lots of questions:

  1. Am I doing the right thing?
  2. I’m using the Adam optimizer. Is it ok?
  3. I’m a bit confused about the forward inputs in the training phase. I read this explanation over and over again and I don’t understand whether I should just use input_ids and lm_labels for the training or not. Also somewhere in this issue someone’s mentioned that:

T5 input sequence should be formatted with [CLS] and [SEP] tokens

So which one is right? I’m super confused.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:33 (21 by maintainers)

github_iconTop GitHub Comments

10reactions
patrickvonplatencommented, May 3, 2020

Hi @amitness,

For T5 summarization you will have to append the prefix "summarize: " to every input data. But you are more or less right. All you have to do is:

  1. Prepare input data
x = tokenizer.encode_plus("summarize: " + sentence, 
                          max_length=500, 
                          pad_to_max_length=True, 
                          return_tensors='pt')
  1. Prepare labels
lm_labels = tokenizer.encode_plus(summary,  
                            return_tensors='pt', 
                            max_length=50, 
                            pad_to_max_length=True)
  1. For tokens that are padded (which is only relevant if you train with batch_size > 1) you need to make sure that no loss is calculated on those tokens, so
lm_labels[lm_labels == tokenizer.pad_token_id] = -100

There is no need to shift the tokens as you show at the end of your comment because T5 does that automatically - see https://github.com/huggingface/transformers/blob/6af3306a1da0322f58861b1fbb62ce5223d97b8a/src/transformers/modeling_t5.py#L1063.

This is also explained in https://huggingface.co/transformers/model_doc/t5.html#training .

4reactions
enzoampilcommented, May 4, 2020

@amitness

E.g. in your summarization case, it would look something like:

from transformers import T5Tokenizer, T5Model

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5Model.from_pretrained('t5-small')
input_ids = tokenizer.encode("summarize: Hello, my dog is cute", return_tensors="pt")
decoder_input_ids = tokenizer.encode("<pad>", return_tensors="pt") 
outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
outputs[0]

Do note that T5ForConditionalGeneration already prepends the padding by default. Above is only necessary if you’re doing a forward pass straight from T5Model.

Regarding your question about making your own prefix, yes, you should be able to train on your own prefix. This is the whole point of T5’s text-to-text approach. You should be able to specify any problem through this kind of approach (e.g. Appendix D in the T5 paper).

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Full Guide to Finetuning T5 for Text2Text and Building a ...
In this article, we see a complete example of fine-tuning of T5 for generating candidate titles for articles. The model is fine-tuned ......
Read more >
Fine Tuning T5 Transformer Model with PyTorch
A T5 is an encoder-decoder model. It converts all NLP problems like language translation, summarization, text generation, question-answering, to ...
Read more >
Fine Tuning a T5 transformer for any Summarization Task
The T5 tuner is a pytorch lightning class that defines the data loaders, forward pass through the model, training one step, validation on...
Read more >
Top 3 Fine-Tuned T5 Transformer Models - Vennify.ai
In this article I'll discuss my top three favourite fine-tuned T5 models that are available on Hugging Face's Model Hub. T5 was published...
Read more >
mrm8488/t5-base-finetuned-break_data - Hugging Face
The T5 model was presented in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel, Noam Shazeer, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found