Improve the documentation for TrainingArguments.label_names, and if possible raise an error if users misinterpret this attribute like I did
See original GitHub issueOriginal Issue Title: Possible typo in trainer.py: prediction_step(), forgetting to exclude loss item of outputs dict when assigning logits
Update: I determined the root cause of my error to stem from an incorrect assignment of TrainingArgument.label_names
. There is not a typo in Trainer.prediction_step()
, as I’ve suggested below. However there is still an issue: see my comment for elaboration.
I was using the Trainer
trying to fine-tune KB-Bert-Base-Swedish-Cased for multi-class SequenceClassification, when I got a IndexError: tuple index out of range
during the evaluation stage (I set up Trainer
to evaluate after each Epoch).
I started PDB and paused at this line in the evaluation phase:
With the debugger, I saw that loss=None
, labels=None
, and logits
is actually tuple
with two items. The first item is the prediction loss as, and the second element is the actual output logits from the models forward pass.
I think this strange assignment of the local logits
variable is coming from here, inside prediction_step
:
As the outputs
dict includes the loss, and “loss” is not in ignore_keys, the loss value in outputs gets baked into logits
.
I’m pretty sure it’s a typo, as when I’m comparing it to a few lines above, (which is executed when has_labels=True), the similar line is:
The above links are all from Version 4.4.2, but this possible typo is still present in master:
I haven’t been able to read and grasp the code too much, but it looks to me like either we’re forgetting to ignore the “loss” key in outputs, or the return statement of prediction_step
should be somehaw unpacking the logits tuple, so the two variables in “logits” tuple are unpacked into loss
and logits
:
For clarity, this is the stacktrace of how I encounter the tuple index error from the above typo:
In the evaluation phase, prediction_loop
runs over all the batches in my dev dataset. It gets the model output/prediction of each dev batch here:
Later in prediction_loop
, we, concatenate each prediction batch with the previous predictions here, calling the function nested_concat
:
Inside nested_concat
, in the line below, new_tensors
is the above mentioned “logits” tuple.
https://github.com/huggingface/transformers/blob/6bc89ed9295443e5a3ee236ad544101752563917/src/transformers/trainer_pt_utils.py#L95
The above line does a recursive call to nested_concat
, and we arrive in the line below.
Which calls this:
And I get a index error, as it’s trying to index into what is actually the loss
tensor.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
I ran into exactly the same issue today. I was also thinking that the parameter
label_names
inTrainingArguments
refers todata["train"].features["label"].names
. The error messageIndexError: tuple index out of range
was not helpful at all and I only found the problem by trial and error.Actually, I was not able to find the description for
label_names
in the documentation but only in the linked source code.Besides, I don’t even understand what “The list of keys in your dictionary of inputs that correspond to the labels.” should mean.
What “dictionary of inputs” and what “list of keys”?
My dataset looks like this
The only dictionaries I see is
DatasetDict
with keys “train” and “test” and eachDataset
with keys “features” and “num_rows”.It would be really helpful if the description of the parameter
label_names
and the error message could be improved.This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.