Optimization: valuesAtQuantiles qdigest functions
See original GitHub issueI noticed that these quantile functions on a qdigest compute the percentile values by looping over the provided percentiles array argument and calling the underlying airlift qdigest method getQuantile
(singular).
However, the airlift qdigest object has a plural version of this method getQuantiles
which seems to be more efficient because it only traverses the qdigest tree once for a given array of quantiles.
I haven’t had a chance to quantify this optimisation yet, but wanted to gauge whether there’d be any opposition to a pull request on this topic?
Thanks
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Quantile Digest Functions — Presto 0.278 Documentation
A quantile digest is a data sketch which stores approximate percentile information. The presto type for this data structure is called qdigest ,...
Read more >Quantile digest functions — Starburst Enterprise
A quantile digest is a data sketch which stores approximate percentile information. The Trino type for this data structure is called qdigest ,...
Read more >Optimization Test Functions and Datasets
The functions listed below are some of the common functions and datasets used for testing optimization algorithms. They are grouped according to ...
Read more >ts2740: type 'collection ' is missing the following properties ...
Solution 1. I finally solved the problem by changing the declaration of the class to class MyComponent extends React.Component<any, MyInterface>.
Read more >T-Digest functions — Trino 402 Documentation
A T-digest is a data sketch which stores approximate percentile information. The Trino type for this data structure is called tdigest . T-digests...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I did some perf testing on this and interestingly found no measurable difference in the two approaches. I tested on the TPCH data on several scale factors, evaluating 20 percentiles with the two approaches outlined in the issue summary. It seems the qdigest building is so dominant that extracting the percentiles has trivial cost, regardless of how it is done.
I agree with @mehrdad-honarkhah, and we’ll be happy to review a pull request for this optimization.
Please be aware, the plural version of the function validates that the quantiles are sorted in ascending order. To preserve compatibility with
approx_percentile
and existing users ofvalues_at_quantiles
, we don’t want to introduce that validation to thevalues_at_quantiles
function. I think it would also be worthwhile to port this over toapprox_percentile
, as it does the same thing currently.