Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

question: percent_rank vs cume_dist

See original GitHub issue

It seems the Ibis percent_rank operation is in fact the cume_dist SQL operation as explained at [1].

So how could we implement percent_rank SQL operation?

Suggestion 1:

maybe ibis percent_rank could have one optional argument like cume_dist (default: True)
and if it is false maybe we can test that using [2], but it means that we should add scipy as a dependence.

any thoughts about this issue?

refs:
[1] https://github.com/ibis-project/ibis/blob/ae71b3a96a514ab599d92382bb55fd247803deac/ibis/tests/all/test_window.py#L43 [2] https://stackoverflow.com/questions/39823470/getting-postgresql-percent-rank-and-scipy-stats-percentileofscore-results-to-mat

extra ref: https://riptutorial.com/sql/example/27456/percent-rank-and-cume-dist

Issue Analytics

State:
Created 4 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

cpcloudcommented, Apr 19, 2022

We’re now correctly implementing percent_rank everywhere, and we have #3590 for adding cume_dist, closing this out.

0reactions

xmnlabcommented, Dec 18, 2021

hi @cpcloud I created this PR long time ago: https://github.com/ibis-project/ibis/pull/2224

so basically the percent_rank tests uses pandas df.rank(pct=True) … but this works as SQL CumeDist.

So, the easiest way would be to rename the operation to CumeDist. and for PercentRank the test should be implemented manually as described in that old PR. also some backend should change the translation to percent_rank to cume_dist.

let me know if you want more information about that.