Approx_percentile() implementation gives wrong results with accuracy specified as param
See original GitHub issueCurrently in presto documentation, when we are trying to find percentiles with more accurate results we are supposed to use Approx_percentile()
which as per the documentation has this syntax :
By default the accuracy is set to 0.01 but this can be changed in the syntax when called with smaller value giving more accurate results.
But somehow when I pass accuracy as a parameter to my query, the results are not correct. I ran the below code:
with temp AS (
SELECT 1 AS num
UNION
SELECT 5 AS num
UNION
SELECT 10 AS num
UNION
SELECT 100 AS num
UNION
SELECT 200 AS num
UNION
SELECT 500 AS num
UNION
SELECT 1000 AS num
UNION
SELECT 10000 AS num
UNION
SELECT 20000 AS num
)
SELECT
APPROX_PERCENTILE(num, 0.5),
APPROX_PERCENTILE(num, 0.5, 0.01),
APPROX_PERCENTILE(num, 0.5, 0.5),
APPROX_PERCENTILE(num, 0.5, 0.5, 0.001)
FROM temp
In the code above results for first two aggregations should be same since default accuracy is 0.01 as mentioned HERE
But the results I get are completely off :
As we can see the first function gives the correct median value while the second doesn’t.
My intuition is that somehow instead of calling approx_percentile() with accuracy, presto is calling approx_percentile() with weight specified.
i.e. even though it should call THIS
I feel its somehow calling THIS
Another Suspected issue for this could be the way sql actually calls the underlying java functions: I found that sql expected this kind of params in its function calls :
i.e. Nowhere it has a function defined for approx_percentile(bigint, BIGINT, double) which should have been for case of weight
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Thanks guys for clearing this confusion. Closing this issue for now.
Also the current prestodb implementation expects weight to be an INTEGER if we call approx_percentile(x, w, percentage) but in prestosql implementation we except weight to be a double as can be seen here
@kunalkohli prestosql.io is a fork of this project, prestodb.io. If you are using PrestoDB you should use documentation from prestodb.io. If you are using PrestoSQL you should use the other documentation.