label_ranking_average_precision_score: sample_weighting isn't applied to items with zero true labels
See original GitHub issueDescription
label_ranking_average_precision_score offers a sample_weighting argument to allow nonuniform contribution of individual samples to the reported metric. Separately, individual samples whose labels are the same for all classes (all true or all false) are treated as a special case (precision == 1, line 732). However, this special case bypasses the application of sample_weight (line 740). So, in the case where there is both non-default sample_weighting and samples with, for instance, zero labels, the reported metric is wrong.
Steps/Code to Reproduce
See example in this colab
import numpy as np
import sklearn.metrics
# Per sample APs are 0.5, 0.75, and 1.0 (default for zero labels).
truth = np.array([[1, 0, 0, 0], [1, 0, 0, 1], [0, 0, 0, 0]], dtype=np.bool)
scores = np.array([[0.3, 0.4, 0.2, 0.1], [0.1, 0.2, 0.3, 0.4], [0.4, 0.3, 0.2, 0.1]])
print(sklearn.metrics.label_ranking_average_precision_score(
truth, scores, sample_weight=[1.0, 1.0, 0.0]))
Expected Results
Average of AP of first and second samples = 0.625
Actual Results
Sum of AP of all three samples, divided by sum of weighting vector = 2.25/2 = 1.125
Versions
System: python: 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] executable: /usr/bin/python3 machine: Linux-4.14.79±x86_64-with-Ubuntu-18.04-bionic
BLAS: macros: SCIPY_MKL_H=None, HAVE_CBLAS=None lib_dirs: /usr/local/lib cblas_libs: mkl_rt, pthread
Python deps: pip: 19.0.3 setuptools: 40.8.0 sklearn: 0.20.3 numpy: 1.14.6 scipy: 1.1.0 Cython: 0.29.6 pandas: 0.22.0
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
I don’t think this is quite the fix we want. The edge case is not that the sample weight is zero (I just used that in the example to make the impact easy to see). The problem is to account for any kind of non-default sample weight in the case of constant labels for all classes.
I’ll work on a solution and some tests.
Bookmarking this. As a first timer, this is super helpful. From this, I understand which file I’m supposed to be writing test cases in and the fact that it should be robust. It really helps also drive the point home about writing code in such a way that very minimal changes need to be made to EXTEND the code. I can see how 4 lines of code help achieve this because of the REST of the code is written overall.
I should develop the patience the will power to work out the math formula for this listed in the documentation. Had I done that, I would’ve written more meaningful code. I’ll spend some time and try to go over why your lines make sense.
Thanks a ton for pinging this here and giving me a notification.