question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

low loss in fine tuning but generated answers are not correct

See original GitHub issue

Hi, I am fine tuning a QA dataset using huggingface unified v2 t5 large, and the sample code is like below

# training
model_inputs = self.tokenizer(questions,
                padding=True, truncation=True, 
                max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)
with self.tokenizer.as_target_tokenizer():
    labels = self.tokenizer(answers,
                    padding=True, truncation=True, 
                    max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)
    # ignore pad token for loss
    labels["input_ids"][
                labels["input_ids"] == self.tokenizer.pad_token_id
    ] = -100
    model_inputs["labels"] = labels["input_ids"]
outputs = self.model(**model_inputs)
loss = outputs.loss


# generate
model_inputs = self.tokenizer(questions, 
                padding=True, truncation=True, 
                max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)

sampled_outputs = self.model.generate(**model_inputs, 
                num_beams=4, max_length=50, early_stopping=True)

I can get fairly low loss (0.41) after fine tuning for around 5 epochs, yet the generated answers are mostly wrong (0.23 accuracy). According to T5 doc it seems that generate can handle the prepending of pad token. Also, the generated answers indeed belong to one of the choices, it is just that they are not the correct ones. I am wondering what might be the issue. Thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7

github_iconTop GitHub Comments

2reactions
cnut1648commented, Jun 30, 2022

Oh I see. Regardless, I think the lesson I learned is that if the performance is not correlated with the loss we can give unifiedqa a longer training epochs/steps. Thank you for the help all the way @danyaljj!!

1reaction
cnut1648commented, Jun 29, 2022

Thanks @danyaljj! After a week’s attempt I think I somehow solved this problem. In my case, it seems that fine tuning more epochs will work. Previously I was fine tuning either 5 or 10 epochs, and got 0.23 accuracy. When fine tuning for 50 epochs, I can get 0.72 accuracy. I wonder that in your paper did you also fine tune with large epoch? Thanks!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

What should I do when my neural network doesn't learn?
This can be done by comparing the segment output to what you know to be the correct answer. This is called unit testing....
Read more >
Fine-Tuning DistilBertForSequenceClassification: Is not ...
Looking at running loss and minibatch loss is easily misleading. ... Tuning and fine-tuning ML models are difficult work.
Read more >
DataCamp/4 - Fine-tuning keras models.py at master - GitHub
You'll now try optimizing a model at a very low learning rate, a very high learning rate, and a "just right" learning rate....
Read more >
On the Stability of Fine-tuning BERT - OpenReview
The paper focuses on the instability phenomenon happening in the fine-tuning of BERT-like models in downstream tasks. The reasons of such instability were ......
Read more >
bigscience/bloom · Fine-tune the model? - Hugging Face
Hi everyone, If you have enough compute you could fine tune BLOOM on any downstream task but you would need enough GPU RAM...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found