Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact!
See original GitHub issueI was trying to upgrade fastutil
from version 6.5.6 (an ancient version from Jun 14, 2013) to the latest, version 8.3.0, when I came across a really insidious multi-part bug. The tl;dr is that there’s a bug in RM3, which will affect all regressions. Here’s the full story:
The class FeatureVector
is built around the fastutil Object2FloatOpenHashMap
class, which is used by the RM3 implementation to estimate relevance models. In the current implementation, when estimating the relevance model for the feedback docs, we truncate each individual feedback document:
docVector.pruneToSize(fbTerms);
This is the first part of the bug. Just because we ultimately want to select fbTerms
terms for feedback doesn’t mean that we should only consider fbTerms
terms from each document. This was probably done for performance reasons, although query latency really isn’t affected. I checked: on my iMac Pro, query latency doesn’t increase with that line removed.
Now this leads to the second part of the bug: the method pruneToSize
sorts the features by weight, but it doesn’t consistently perform tie breaking. This means tie breaking is implementation specific, which means that the fastutil upgrade changed the tie-breaking behavior, which means that different terms are selected from documents, which changes the results.
Insert face plam here.
So to fix this, we need to:
- Not prune selection from individual docs.
- To prevent future issues along these lines, implement consistent tie-breaking behavior in the
FeatureVector
implementation.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
Okay, here are the results, on Robust04:
Note that the tuned “fixed” results use the old parameter settings, without retuning.
cf: https://github.com/castorini/anserini/blob/master/docs/experiments-forum2018.md
For the record, these are the commands:
It doesn’t seem right to not fix a bug because it would change numbers. Isn’t this the correct, desired outcome of a bug fix? Fix the bug, update the tests…? It doesn’t seem right to use / cite an RM3 implementation that is incorrect…?