`compute_metrics` show better results than `generate` because target data leaks
See original GitHub issueEnvironment info
transformers
version: 4.3.2- Platform: Linux-5.8.18-050818-generic-x86_64-with-glibc2.10
- Python version: 3.8.5
- PyTorch version (GPU?): 1.7.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help
Information
Model I am using:
- t5
The problem arises when using:
- my own modified scripts: (give details below)
The tasks I am working on is:
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- train model with
compute_metrics
function to monitor metrics - use
generate
to predict text with trained model
Expected behavior
I expect the metrics of compute_metrics
to be equal to my generated text.
More information
While training, I used compute_metrics
to calculate the metric on my validation set every X steps. I was surprised to see that after training my model did not perform as expected using the generate
function provided by huggingface.
After some digging through the code I think I understand what the problem is. compute_metrics
takes as input preds
, which is a collection of logits
from prediction_step
which internally calls model
with the inputs and targets of the model.
This means that the target text leaks into preds.predictions
because mode.forward
used the targets as input for the decoder. This makes the metrics of compute_metrics
seem much better than they really are.
In my opinion the target data should not be used to create preds.predictions
. Maybe the generate
function is a better fit.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (3 by maintainers)
Top GitHub Comments
Did you use the flag
--predict_with_generate
? It’s there just for this: predicting using thegenerate
method and the labels are then not passed (except to compute the loss).This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.