Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ndcg_score fails for negative scores

See original GitHub issue

Description

The method ndcg_score from sklearn.metrics fails when the true relevance scores are negative.

Steps/Code to Reproduce

import numpy as np
from sklearn.metrics import ndcg_score 

y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)

ndcg_score(y_true, y_score)  # Should be between 0 and 1 as per the docstring
>>> 396.0329594603174

ndcg_score(y_true+1, y_score+1)
>>> 0.7996030755957273

Expected Results

The documentation doesn’t explicitly state that y_true or y_score should be non-negative. The cited Wikipedia article for DCG doesn’t seem to mention a non-negativity assumption either. So either the method should be able to deal with scores regardless of sign, or the documentation should explicitly say otherwise.

Disclosure/Question: I’m not an expert in ranking metrics, but it seems that there might be cases where one might want to compare lists of scores in (-∞,∞) based on their ordering alone (i.e., not MSE or related metrics). Is there any other metric in scikit-learn that is more appropriate for that use case?

Versions

numpy: 1.18.1 scipy: 1.4.1 sklearn: 0.22.1

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:14 (9 by maintainers)

Top GitHub Comments

2reactions

trinhconcommented, Mar 3, 2022

Hi there, The implementation in the branch that mentioned this issue throws a “DeprecationWarning” as suggested by jeromesdockes. This sort of gives the best of both worlds, that it allows current functionality to still be used, but throws the warning about ndcg_score not always providing results between 0 and 1 for negative y_true values.

Feel free to suggest other changes too, if there’s another change in mind. I’ll be making a PR to this repo within a few days, once I get docs and tests done.

2reactions

dsandeep0138commented, Jun 23, 2020

This is what I think of this issue. Please let me know your comments.

NDCG is a metric that should be a value between 0 and 1. This is given by NDCG = actual DCG / Ideal DCG. Rephrasing it, NDCG is actually a measure of where our DCG is on a scale of 0 and Ideal DCG. This is because, min DCG is always assumed to be 0, which might be true in other cases, but not in a case where we have negative y_true values. So, correcting the above rephrased sentence, we can say, NDCG is actually a measure of where our DCG is located on a scale of min DCG and Ideal DCG

So, the actual NDCG formula should have been, NDCG = (actual DCG - min DCG) / (Ideal DCG - min DCG). When lower bound (i.e., minimum) is 0, we get back the formula that we always used.

Next, is how do we calculate Ideal DCG and min DCG given y_true and y_score.

# From the wiki, sorting all relevant documents in the corpus by their
# relative relevance, produces the maximum possible DCG through position,
# also called Ideal DCG (IDCG) through that position

Ideal DCG = dcg_score(sorted(y_true, reverse=True), sorted(y_true, reverse=True))

# Likewise to get min DCG, take the documents in the opposite direction.
# Notice that the 2nd argument is sorted in ascending order
# (least relevance to most relevance)

min DCG = dcg_score(sorted(y_true, reverse=True), sorted(y_true, reverse=False))

NDCG score calculation:

y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07, 0.31, 0.75, 0.33, 0.27]).reshape(1,-1)
max_dcg  = -0.001494970324771916
min_dcg =  -1.0747913396929056
actual_dcg =  -0.5920575220247735
ndcg_score = 0.44976749334605975

On a final note, y_score values just provide the relative ranks (or positions). y_score values are not involved in the calculation of dcg (or ndcg), only their positions are taken. So, shifting the y_score values by a constant amount shouldn’t affect the ndcg score, right (because the positions wouldn’t change)? And when you consider a formula with min DCG, I can see this to hold true.

# I have increased all y_score values by 1
y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([1.07, 1.31, 1.75, 1.33, 1.27]).reshape(1,-1)
max_dcg = 2.9469641485546205
min_dcg: 1.8736677791864862
actual_dcg = 2.3564015968546186
ndcg_score = 0.4497674933460597

Top Results From Across the Web

sklearn.metrics.ndcg_score — scikit-learn 1.2.0 documentation

Compute Normalized Discounted Cumulative Gain. Sum the true scores ranked in the order induced by the predicted scores, after applying a logarithmic discount....

The Impact of Negative Relevance Judgments on NDCG

The lower bound can fall below 0, if the DCG value is negative, i.e., when enough documents with negative gain values appear at...

The Impact of Negative Relevance Judgments on NDCG

Negative relevance labels cause NDCG to be unbounded. This is probably why widely used implementations of NDCG map negative relevance labels to ...

Proper way to use NDCG@k score for recommendations

Currently I am building a recommender system and using ranking metrics to verify its performance. I am using the NDCG@k score. Today I...

A Theoretical Analysis of NDCG Ranking Measures

A central problem in ranking is to design a measure for evaluation of ranking ... At the first glance, the above result is...