ndcg_score fails for negative scores
See original GitHub issueDescription
The method ndcg_score from sklearn.metrics fails when the true relevance scores are negative.
Steps/Code to Reproduce
import numpy as np
from sklearn.metrics import ndcg_score
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
ndcg_score(y_true, y_score) # Should be between 0 and 1 as per the docstring
>>> 396.0329594603174
ndcg_score(y_true+1, y_score+1)
>>> 0.7996030755957273
Expected Results
The documentation doesn’t explicitly state that y_true or y_score should be non-negative. The cited Wikipedia article for DCG doesn’t seem to mention a non-negativity assumption either. So either the method should be able to deal with scores regardless of sign, or the documentation should explicitly say otherwise.
Disclosure/Question: I’m not an expert in ranking metrics, but it seems that there might be cases where one might want to compare lists of scores in (-∞,∞) based on their ordering alone (i.e., not MSE or related metrics). Is there any other metric in scikit-learn that is more appropriate for that use case?
Versions
numpy: 1.18.1 scipy: 1.4.1 sklearn: 0.22.1
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:14 (9 by maintainers)
Top GitHub Comments
Hi there, The implementation in the branch that mentioned this issue throws a “DeprecationWarning” as suggested by jeromesdockes. This sort of gives the best of both worlds, that it allows current functionality to still be used, but throws the warning about ndcg_score not always providing results between 0 and 1 for negative y_true values.
Feel free to suggest other changes too, if there’s another change in mind. I’ll be making a PR to this repo within a few days, once I get docs and tests done.
This is what I think of this issue. Please let me know your comments.
NDCG
is a metric that should be a value between 0 and 1. This is given byNDCG = actual DCG / Ideal DCG
. Rephrasing it,NDCG
is actually a measure of where our DCG is on a scale of 0 and Ideal DCG. This is because, min DCG is always assumed to be 0, which might be true in other cases, but not in a case where we have negative y_true values. So, correcting the above rephrased sentence, we can say,NDCG
is actually a measure of where our DCG is located on a scale of min DCG and Ideal DCGSo, the actual NDCG formula should have been,
NDCG = (actual DCG - min DCG) / (Ideal DCG - min DCG)
. When lower bound (i.e., minimum) is 0, we get back the formula that we always used.Next, is how do we calculate Ideal DCG and min DCG given y_true and y_score.
NDCG score calculation:
On a final note, y_score values just provide the relative ranks (or positions). y_score values are not involved in the calculation of dcg (or ndcg), only their positions are taken. So, shifting the y_score values by a constant amount shouldn’t affect the ndcg score, right (because the positions wouldn’t change)? And when you consider a formula with min DCG, I can see this to hold true.