Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]
See original GitHub issueTimeLimitingBulkScorer
scores 100 documents at a time. Unfortunately, bulk scorers have non-null overhead for BulkScorer#score
since they need to set the scorer, figure out how to combine the Scorer with the competitive iterator of the collector, etc. Larger windows of doc IDs would help better amortize such costs.
Could we grow the window of scored doc IDs exponentially, maybe with guarantees such as making sure that the new window is at most 50% of doc IDs that have been scored so far so that this exponential growth could only exceed the configured timeout by 50%?
Migrated from LUCENE-10640 by Adrien Grand (@jpountz)
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
This issue can't be displayed right now - - ASF JIRA
LUCENE-10640Can TimeLimitingBulkScorer exponentially grow the window size ? Improvement. LUCENE-10639WANDScorer performs better without two-phase.
Read more >TCP Window Size Scaling - NetworkLessons.com
With TCP slow start, the window size will initially grow exponentially (window size doubles) but once a packet is dropped, the window size...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry for the confusion, I was thinking of not relying on any timing info at all besides the one that is already encapsulated by the
QueryTimeout
object. Just relying on the fact that if we haven’t hit the timeout yet, and then score a window that is 50% larger, then we wouldn’t exceed the timeout by much.E.g. increasing the window by 50% on every iteration in order to limit the overhead of timeout checks.
It’s not very smart, but I like that:
QueryTimeout
object, only whether the timeout was hit or not yet.It is worth it. nobody wants to debug test failures that happen because NTP skewed the clock.