Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact!

See original GitHub issue

I was trying to upgrade fastutil from version 6.5.6 (an ancient version from Jun 14, 2013) to the latest, version 8.3.0, when I came across a really insidious multi-part bug. The tl;dr is that there’s a bug in RM3, which will affect all regressions. Here’s the full story:

The class FeatureVector is built around the fastutil Object2FloatOpenHashMap class, which is used by the RM3 implementation to estimate relevance models. In the current implementation, when estimating the relevance model for the feedback docs, we truncate each individual feedback document:

docVector.pruneToSize(fbTerms);

This is the first part of the bug. Just because we ultimately want to select fbTerms terms for feedback doesn’t mean that we should only consider fbTerms terms from each document. This was probably done for performance reasons, although query latency really isn’t affected. I checked: on my iMac Pro, query latency doesn’t increase with that line removed.

Now this leads to the second part of the bug: the method pruneToSize sorts the features by weight, but it doesn’t consistently perform tie breaking. This means tie breaking is implementation specific, which means that the fastutil upgrade changed the tie-breaking behavior, which means that different terms are selected from documents, which changes the results.

Insert face plam here.

So to fix this, we need to:

Not prune selection from individual docs.
To prevent future issues along these lines, implement consistent tie-breaking behavior in the FeatureVector implementation.

Issue Analytics

State:
Created 4 years ago
Comments:10 (4 by maintainers)

Top GitHub Comments

2reactions

lintoolcommented, Dec 10, 2019

Okay, here are the results, on Robust04:

AP	Paper 1	Paper 2
BM25+RM3 (default)	0.2903	0.2903
BM25+RM3 (default): fixed	0.2920	0.2920
BM25+RM3 (tuned)	0.3043	0.3021
BM25+RM3 (tuned): fixed	0.3004	0.2989

Note that the tuned “fixed” results use the old parameter settings, without retuning.

cf: https://github.com/castorini/anserini/blob/master/docs/experiments-forum2018.md

For the record, these are the commands:

python src/main/python/fine_tuning/reconstruct_robus04_tuned_run.py \
 --index lucene-index.robust04.pos+docvectors+rawdocs \
 --folds src/main/resources/fine_tuning/robust04-paper1-folds.json \
 --params src/main/resources/fine_tuning/params/params.map.robust04-paper1-folds.bm25+rm3.json \
 --output run.robust04.bm25+rm3.paper1.txt


python src/main/python/fine_tuning/reconstruct_robus04_tuned_run.py \
 --index lucene-index.robust04.pos+docvectors+rawdocs \
 --folds src/main/resources/fine_tuning/robust04-paper2-folds.json \
 --params src/main/resources/fine_tuning/params/params.map.robust04-paper2-folds.bm25+rm3.json \
 --output run.robust04.bm25+rm3.paper2.txt


eval/trec_eval.9.0.4/trec_eval src/main/resources/topics-and-qrels/qrels.robust04.txt run.robust04.bm25+rm3.paper1.txt

eval/trec_eval.9.0.4/trec_eval src/main/resources/topics-and-qrels/qrels.robust04.txt run.robust04.bm25+rm3.paper2.txt

2reactions

daltonjcommented, Dec 10, 2019

It doesn’t seem right to not fix a bug because it would change numbers. Isn’t this the correct, desired outcome of a bug fix? Fix the bug, update the tests…? It doesn’t seem right to use / cite an RM3 implementation that is incorrect…?