question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

See original GitHub issue

TimeLimitingBulkScorer scores 100 documents at a time. Unfortunately, bulk scorers have non-null overhead for BulkScorer#score since they need to set the scorer, figure out how to combine the Scorer with the competitive iterator of the collector, etc. Larger windows of doc IDs would help better amortize such costs.

Could we grow the window of scored doc IDs exponentially, maybe with guarantees such as making sure that the new window is at most 50% of doc IDs that have been scored so far so that this exponential growth could only exceed the configured timeout by 50%?


Migrated from LUCENE-10640 by Adrien Grand (@jpountz)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jpountzcommented, Nov 8, 2022

Sorry for the confusion, I was thinking of not relying on any timing info at all besides the one that is already encapsulated by the QueryTimeout object. Just relying on the fact that if we haven’t hit the timeout yet, and then score a window that is 50% larger, then we wouldn’t exceed the timeout by much.

E.g. increasing the window by 50% on every iteration in order to limit the overhead of timeout checks.

  • collect 100 docs
  • collect 150 docs
  • collect 225 docs
  • collect 337 docs
  • collect 505 docs
  • collect 757 docs - Worst-case scenario: the timeout was hit right after starting collecting so we collected 757 docs after the timeout. Yet we had collected 1317 docs before the timeout, so we only exceeded the expected timeout by 57%.

It’s not very smart, but I like that:

  • It’s simple.
  • It doesn’t require exposing how much time is left on the QueryTimeout object, only whether the timeout was hit or not yet.
0reactions
rmuircommented, Nov 8, 2022

It is worth it. nobody wants to debug test failures that happen because NTP skewed the clock.

Read more comments on GitHub >

github_iconTop Results From Across the Web

This issue can't be displayed right now - - ASF JIRA
LUCENE-10640Can TimeLimitingBulkScorer exponentially grow the window size ? Improvement. LUCENE-10639WANDScorer performs better without two-phase.
Read more >
TCP Window Size Scaling - NetworkLessons.com
With TCP slow start, the window size will initially grow exponentially (window size doubles) but once a packet is dropped, the window size...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found