Possible error in ndcg_diffs calculation
See original GitHub issueI think it should be (with log2 on denominator only):
ndcg_diffs = (1 / (1 + doc_ranks[:n_rel]).log2()) - (1 / (1 + doc_ranks[n_rel:]).log2()).view(n_irr)
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
How To Calculate Error (With Steps, Example and Types)
In this article, we explain how to calculate error, then share an example and 12 types of errors in business, mathematics and science....
Read more >How to Determine Greatest Possible Error - Study.com
Learn how to determine greatest possible error, and see examples that walk through sample ... Thus, it is calculated by dividing the precision...
Read more >How to Calculate Percent Error - ThoughtCo
The purpose of a percent error calculation is to gauge how close a measured value is to a true value. Percent error (percentage...
Read more >Uncertainties and Error Propagation
Derivation: We will assume that the uncertainties are arranged so as to make z as far from its true value as possible. Average...
Read more >Calculate Percentage Error - HeyTutor
Why Do We Calculate Percentage Error? There are many reasons for calculating percentage errors. Engineers use it to determine the precision of a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The relevance is the human label for the document while the rank is where the document ends up after being scored by the model. To be a little more concrete about the calculations, imagine we provide a scoring model
M
with a queryQ
and it returns a set of ten documents in the following order:(D_6, D_0, D_7, D_4, D_8, D_5, D_9, D_1, D_3, D_2)
. A human could then go through each document and assign it a relevance of (in the binary case) one or zero for the givenQ
(an alternative labeling strategy would be to have several relevance levels, e.g., “very relevant”, “somewhat relevant”, and “irrelevant”). Once we have this information, we can calculate the DCG for this result set and eachD_i
will contribute a term of:to the summation. If we swap
D_0
andD_3
and calculate the new DCG, all of thedcg_i'
terms except fordcg_0'
anddcg_3'
will stay the same, so those terms will cancel when taking the difference (this is true whether we use binary relevance or not). As a result, what we have left is:If we pretend like both documents are relevant, then:
After swapping them, we have:
so:
The same is true when you have two irrelevant documents, but it’s even less interesting because their
dcg_i
s are equal to zero. In the case whereD_0
is relevant andD_3
is irrelevant, we have:so:
which is what I’m calculating.
That’s what I’m calculating. As I mention in the code, I’m assuming binary relevance. Because the denominator for the normalized discounted cumulative gain (NDCG) won’t change following a ranking swap, we can ignore it until the end (I changed the name of the
ndcg_diffs
variable todcg_diffs
to make it a little more clear what’s being computed right there). The discounted cumulative gain (DCG) is a summation of terms corresponding to each document, where a documenti
’s term is equal to(2 ^ rel_i - 1) / log2(rank_i + 1)
. As a result, swapping the ranks of two documents only changes the values of the corresponding two terms in the summation. Therefore, swapping two relevant documents or two irrelevant documents will result in the same DCG, and so the difference between those two NDCGs will be zero (i.e., they don’t need to be computed). Because the value of a term associated with an irrelevant document is zero (i.e.,(2 ^ 0 - 1) = 0
), the difference in the DCGs following a ranking swap between a relevant and an irrelevant document is simply the difference between the value of the term for the relevant document in its original ranking and the value of the term for the relevant document in its “irrelevant” ranking.1 / (1 + doc_ranks[:n_rel]).log2()
are the terms corresponding to the relevant documents in their original rankings.1 / (1 + doc_ranks[n_rel:]).log2()
are the terms corresponding to the relevant documents in their swapped rankings.You’ll notice the
dcg_diffs
tensor has a shape of(n_rel, n_irr)
(i.e., it holds the DCG differences for all non-zero swaps). The reason I did it the way I did is because it’s vectorized, which will run much faster than a for loop (which is what the code you linked to uses). In fact, if you look at this commit, I was originally using a for loop (and had the logarithm in the right place; that explains why I remember it working!).