metrics.ndcg_score is busted
See original GitHub issueDescription
metrics.ndcg_score
is busted
Steps/Code to Reproduce
from sklearn import metrics
# test 1
y_true = [0, 1, 2, 1]
y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9], [0.1, 0.3, 0.6]]
metrics.ndcg_score(y_true, y_score)
# test 2
y_true = [0, 1, 0, 1]
y_score = [[0.15, 0.85], [0.7, 0.3], [0.06, 0.94], [0.7, 0.3]]
metrics.ndcg_score(y_true, y_score)
Expected Results
No error is thrown.
Actual Results
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-35bb0e2c9b0e> in <module>()
----> 1 metrics.ndcg_score(y_true, y_score)
/Users/iancassidy/virtualenvs/upside/lib/python2.7/site-packages/sklearn/metrics/ranking.py in ndcg_score(y_true, y_score, k)
849
850 if binarized_y_true.shape != y_score.shape:
--> 851 raise ValueError("y_true and y_score have different value ranges")
ValueError: y_true and y_score have different value ranges
Versions
Darwin-16.7.0-x86_64-i386-64bit
('Python', '2.7.10 (default, Feb 7 2017, 00:08:15) \n[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]')
('NumPy', '1.13.3')
('SciPy', '0.19.1')
('Scikit-Learn', '0.19.0')
Issue Analytics
- State:
- Created 6 years ago
- Comments:21 (18 by maintainers)
Top Results From Across the Web
sklearn.metrics.ndcg_score — scikit-learn 1.2.0 documentation
Compute Normalized Discounted Cumulative Gain. Sum the true scores ranked in the order induced by the predicted scores, after applying a logarithmic discount....
Read more >Why does ndcg_score result in nan values? - Stack Overflow
I cannot recreate the error you are reporting, but using error_score="raise" and n_jobs=1 (not strictly necessary, but the output is a ...
Read more >How to use ndcg metric for binary relevance
I am working on a ranking problem to predict the right single document based on the user query and use the NDCG metric...
Read more >Source code for sklearn.metrics._ranking
"""Metrics to assess performance on classification task given scores. ... Normalized Discounted Cumulative Gain (NDCG, computed by ndcg_score) is preferred.
Read more >tfr.keras.metrics.NDCGMetric - Ranking - TensorFlow
where rank ( s i ) is the rank of item i after sorting by scores s with ties broken randomly. References. Cumulated...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I completey agree. NDCG is meant to evaluate a ranking with respect to the true scores of the scored entities.
(see the wikipedia page, Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446., or Wang, Y., Wang, L., Li, Y., He, D., Chen, W., & Liu, T. Y. (2013, May). A theoretical analysis of NDCG ranking measures. In Proceedings of the 26th Annual Conference on Learning Theory (COLT 2013).)
For example, evaluate a ranking of answers to a query with respect to the actual relevance of the answers. Ogrisel’s code for which @qinhanmin2014 provided a link is a typical use case of NDCG, and the implementation is correct. So ndcg_score should accept two 2-d arrays of the same shape, y_score contains the scores inducing the predicted ranking and y_true containing a floating-point value (e.g. relevance, term frequency, …) for each output dimension. for example something like this should be ok:
we can check wether y_true is a vector of labels instead of a matrix of true scores and perform one-hot encoding, but since this is not the most common use case it may be better to keep the interface simple and let the user one-hot encode it they want to do this.
NDCG is not for classification; y_score and y_true should have the same shape