question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

question: percent_rank vs cume_dist

See original GitHub issue

It seems the Ibis percent_rank operation is in fact the cume_dist SQL operation as explained at [1].

So how could we implement percent_rank SQL operation?

Suggestion 1:

  • maybe ibis percent_rank could have one optional argument like cume_dist (default: True)
  • and if it is false maybe we can test that using [2], but it means that we should add scipy as a dependence.

any thoughts about this issue?

refs:
[1] https://github.com/ibis-project/ibis/blob/ae71b3a96a514ab599d92382bb55fd247803deac/ibis/tests/all/test_window.py#L43 [2] https://stackoverflow.com/questions/39823470/getting-postgresql-percent-rank-and-scipy-stats-percentileofscore-results-to-mat

extra ref: https://riptutorial.com/sql/example/27456/percent-rank-and-cume-dist

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
cpcloudcommented, Apr 19, 2022

We’re now correctly implementing percent_rank everywhere, and we have #3590 for adding cume_dist, closing this out.

0reactions
xmnlabcommented, Dec 18, 2021

hi @cpcloud I created this PR long time ago: https://github.com/ibis-project/ibis/pull/2224

so basically the percent_rank tests uses pandas df.rank(pct=True) … but this works as SQL CumeDist.

So, the easiest way would be to rename the operation to CumeDist. and for PercentRank the test should be implemented manually as described in that old PR. also some backend should change the translation to percent_rank to cume_dist.

let me know if you want more information about that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

cume_dist vs percent_rank or difference between - sql
PERCENT_RANK is similar to the CUME_DIST (cumulative distribution) function. The range of values returned by PERCENT_RANK is 0 to 1, inclusive.
Read more >
What's the Difference Between PERCENT_RANK and ...
For a list of scores, PERCENT_RANK returns the percent of values less than the current score. CUME_DIST, which stands for cumulative ...
Read more >
percent_rank(), cume_dist() and ntile() - Yugabyte Docs
You can use cume_dist() to answer questions like this: Show me the rows whose score is within the top x% of the window's...
Read more >
Percent_Rank and Cume_Dist functions in SQL Server 2012
Solution. PERCENT_RANK() : this represents the percentage of values less than the current value in the group, excluding the highest value.
Read more >
PERCENT_RANK Vs. CUMM_DIST - EDB
During research for my Postgres Window Magic talk, I studied the unusual behavior of percent_rank and cumm_dist (cumulative distribution).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found