Accuracy computation with variable classes
See original GitHub issue❓ Questions/Help/Support
I’ve been using the Accuracy
class (in ignite.metrics
) successfully so far in a multi-class setting to compute test set accuracy.
Recently, I’ve hit a new scenario/dataset where the test set has different number of classes for each example. You can think of this as a multiple-choice selection task, where each example has a different number of candidates.
Now in this scenario, when I used Accuracy
, I got the below error:
ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: Input data number of classes has changed from 13 to 81.
I dived into the code and I see that the base class for Accuracy
, i.e., _BaseClassification
assumes that the number of classes/candidates is fixed across test set examples. And this is why I’m getting the above error at run-time.
However, for multi-class accuracy, as you can see in the update()
method here, we are simply computing the number of correct predictions via argmax
along dimension-1 and measuring matches against y
.
This computation of correct
should be accurate even if the number of candidates changes across two examples from the test set, right?
So in effect, the computed accuracy would be correct even in such a case because the underlying variables are computed correctly?
What do you think, @vfdev-5? Is it possible to have a version of Accuracy
for multi-class where there is no expectation that the number of classes is the same across all examples in the test set?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top GitHub Comments
@vfdev-5 Thanks a lot for the quick response! Here’s an example model,
GPT2DoubleHeadsModel
from thetransformers
library by Hugging Face.For the multiple-choice classification task, in principle one could have a varying number of candidates/choices across examples in the test set.
So
x1
might haveC1
candidates,x2
might haveC2
candidates,x3
might haveC3
candidates and so on, like you showed above. The model would output logits over the candidate set forx1
, another logits over the candidate set forx2
, etc.And we would know the multiple-choice label for
x1
, the multiple-choice label forx2
, etc. So we should be able to compute the accuracy by computing match b/w logits and labels forx1
, then forx2
, and so on. And then finally we would have the total test set accuracy.I think the padding on maximum number of classes should work, but that can cause memory issues when the maximum number of classes is very high compared to most of the examples.
Fixing the number of classes might make sense for training time to make life easier, but that might not be the case during inference time.
@g-karthik maybe TopKCategoricalAccuracy can help ?