Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimization: valuesAtQuantiles qdigest functions

See original GitHub issue

I noticed that these quantile functions on a qdigest compute the percentile values by looping over the provided percentiles array argument and calling the underlying airlift qdigest method getQuantile (singular).

However, the airlift qdigest object has a plural version of this method getQuantiles which seems to be more efficient because it only traverses the qdigest tree once for a given array of quantiles.

I haven’t had a chance to quantify this optimisation yet, but wanted to gauge whether there’d be any opposition to a pull request on this topic?

Thanks

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

blrnw3commented, Feb 21, 2019

I did some perf testing on this and interestingly found no measurable difference in the two approaches. I tested on the TPCH data on several scale factors, evaluating 20 percentiles with the two approaches outlined in the issue summary. It seems the qdigest building is so dominant that extracting the percentiles has trivial cost, regardless of how it is done.

2reactions

tdcmeehancommented, Jan 24, 2019

I agree with @mehrdad-honarkhah, and we’ll be happy to review a pull request for this optimization.

Please be aware, the plural version of the function validates that the quantiles are sorted in ascending order. To preserve compatibility with approx_percentile and existing users of values_at_quantiles, we don’t want to introduce that validation to the values_at_quantiles function. I think it would also be worthwhile to port this over to approx_percentile, as it does the same thing currently.