Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

See original GitHub issue

TimeLimitingBulkScorer scores 100 documents at a time. Unfortunately, bulk scorers have non-null overhead for BulkScorer#score since they need to set the scorer, figure out how to combine the Scorer with the competitive iterator of the collector, etc. Larger windows of doc IDs would help better amortize such costs.

Could we grow the window of scored doc IDs exponentially, maybe with guarantees such as making sure that the new window is at most 50% of doc IDs that have been scored so far so that this exponential growth could only exceed the configured timeout by 50%?

Migrated from LUCENE-10640 by Adrien Grand (@jpountz)

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

jpountzcommented, Nov 8, 2022

Sorry for the confusion, I was thinking of not relying on any timing info at all besides the one that is already encapsulated by the QueryTimeout object. Just relying on the fact that if we haven’t hit the timeout yet, and then score a window that is 50% larger, then we wouldn’t exceed the timeout by much.

E.g. increasing the window by 50% on every iteration in order to limit the overhead of timeout checks.

collect 100 docs
collect 150 docs
collect 225 docs
collect 337 docs
collect 505 docs
collect 757 docs - Worst-case scenario: the timeout was hit right after starting collecting so we collected 757 docs after the timeout. Yet we had collected 1317 docs before the timeout, so we only exceeded the expected timeout by 57%.

It’s not very smart, but I like that: