Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

low loss in fine tuning but generated answers are not correct

See original GitHub issue

Hi, I am fine tuning a QA dataset using huggingface unified v2 t5 large, and the sample code is like below

# training
model_inputs = self.tokenizer(questions,
                padding=True, truncation=True, 
                max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)
with self.tokenizer.as_target_tokenizer():
    labels = self.tokenizer(answers,
                    padding=True, truncation=True, 
                    max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)
    # ignore pad token for loss
    labels["input_ids"][
                labels["input_ids"] == self.tokenizer.pad_token_id
    ] = -100
    model_inputs["labels"] = labels["input_ids"]
outputs = self.model(**model_inputs)
loss = outputs.loss


# generate
model_inputs = self.tokenizer(questions, 
                padding=True, truncation=True, 
                max_length=self.tokenizer.model_max_length, return_tensors="pt").to(device)

sampled_outputs = self.model.generate(**model_inputs, 
                num_beams=4, max_length=50, early_stopping=True)

I can get fairly low loss (0.41) after fine tuning for around 5 epochs, yet the generated answers are mostly wrong (0.23 accuracy). According to T5 doc it seems that generate can handle the prepending of pad token. Also, the generated answers indeed belong to one of the choices, it is just that they are not the correct ones. I am wondering what might be the issue. Thanks!

Issue Analytics

State:
Created a year ago
Comments:7

Top GitHub Comments

2reactions

cnut1648commented, Jun 30, 2022

Oh I see. Regardless, I think the lesson I learned is that if the performance is not correlated with the loss we can give unifiedqa a longer training epochs/steps. Thank you for the help all the way @danyaljj!!

1reaction

cnut1648commented, Jun 29, 2022

Thanks @danyaljj! After a week’s attempt I think I somehow solved this problem. In my case, it seems that fine tuning more epochs will work. Previously I was fine tuning either 5 or 10 epochs, and got 0.23 accuracy. When fine tuning for 50 epochs, I can get 0.72 accuracy. I wonder that in your paper did you also fine tune with large epoch? Thanks!!

Top Results From Across the Web

What should I do when my neural network doesn't learn?

This can be done by comparing the segment output to what you know to be the correct answer. This is called unit testing....

Fine-Tuning DistilBertForSequenceClassification: Is not ...

Looking at running loss and minibatch loss is easily misleading. ... Tuning and fine-tuning ML models are difficult work.

DataCamp/4 - Fine-tuning keras models.py at master - GitHub

You'll now try optimizing a model at a very low learning rate, a very high learning rate, and a "just right" learning rate....

On the Stability of Fine-tuning BERT - OpenReview

The paper focuses on the instability phenomenon happening in the fine-tuning of BERT-like models in downstream tasks. The reasons of such instability were ......

bigscience/bloom · Fine-tune the model? - Hugging Face

Hi everyone, If you have enough compute you could fine tune BLOOM on any downstream task but you would need enough GPU RAM...