Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Approx_percentile() implementation gives wrong results with accuracy specified as param

See original GitHub issue

Currently in presto documentation, when we are trying to find percentiles with more accurate results we are supposed to use Approx_percentile() which as per the documentation has this syntax :

By default the accuracy is set to 0.01 but this can be changed in the syntax when called with smaller value giving more accurate results.

But somehow when I pass accuracy as a parameter to my query, the results are not correct. I ran the below code:

with temp AS (
  SELECT 1 AS num
  UNION
  SELECT 5 AS num
  UNION
  SELECT 10 AS num
  UNION
  SELECT 100 AS num
  UNION
  SELECT 200 AS num
  UNION
  SELECT 500 AS num
  UNION
  SELECT 1000 AS num
  UNION
  SELECT 10000 AS num
  UNION
  SELECT 20000 AS num
)
SELECT
  APPROX_PERCENTILE(num, 0.5),
  APPROX_PERCENTILE(num, 0.5, 0.01),
  APPROX_PERCENTILE(num, 0.5, 0.5),
  APPROX_PERCENTILE(num, 0.5, 0.5, 0.001)
FROM temp

In the code above results for first two aggregations should be same since default accuracy is 0.01 as mentioned HERE

But the results I get are completely off :

As we can see the first function gives the correct median value while the second doesn’t.

My intuition is that somehow instead of calling approx_percentile() with accuracy, presto is calling approx_percentile() with weight specified.

i.e. even though it should call THIS

I feel its somehow calling THIS

Another Suspected issue for this could be the way sql actually calls the underlying java functions: I found that sql expected this kind of params in its function calls :

i.e. Nowhere it has a function defined for approx_percentile(bigint, BIGINT, double) which should have been for case of weight

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

kunalkohlicommented, Jul 2, 2020

Thanks guys for clearing this confusion. Closing this issue for now.

Also the current prestodb implementation expects weight to be an INTEGER if we call approx_percentile(x, w, percentage) but in prestosql implementation we except weight to be a double as can be seen here

1reaction

mbasmanovacommented, Jul 2, 2020

@kunalkohli prestosql.io is a fork of this project, prestodb.io. If you are using PrestoDB you should use documentation from prestodb.io. If you are using PrestoSQL you should use the other documentation.