question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rank(pct=True) behaves strangely on big data

See original GitHub issue

Code Sample, a copy-pastable example if possible

smallData = pd.DataFrame({'a': [0]*10 + [1,2,3]})
print(smallData.a.rank(pct=True).tail())

bigData = pd.DataFrame({'a': [0]*100000000 + [1,2,3]})
print(bigData.a.rank(pct=True).tail())

When I use pd.DataFrame().rank(pct=True) on small data (see the first example), it gives me percentages or percentiles. However when data is big, it doesn’t return percentages. Maybe it expected output, I just want to calculate percentiles on big data.

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

Output

8 0.423077 9 0.423077 10 0.846154 11 0.923077 12 1.000000

99999998 2.980232 99999999 2.980232 100000000 5.960465 100000001 5.960465 100000002 5.960465

Expected Output

I would expect something close to 0.5 for all 0 and something close to 1 for all other values

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: None numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, Nov 14, 2018

Ha no worries. Changes are exactly the same so gets to the same spot. Let’s stick with yours

0reactions
jschendelcommented, Nov 14, 2018

@WillAyd : Oops, just saw this after posting a PR of my own!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas Rank: unexpected behavior for method = 'dense' and ...
All that pct=True does is divide by the nobs, which gives unexpected behavior for method = 'dense', so this considered as a bug...
Read more >
pandas.Series.rank — pandas 1.5.2 documentation
Compute numerical data ranks (1 through n) along axis. ... pct_rank: when setting pct = True , the ranking is expressed as percentile...
Read more >
Pandas .groupby(), Lambda Function, & Pivot Table Tutorial
This lesson of the Python Tutorial for Data Analysis covers grouping data with pandas .groupby(), using lambda functions and pivot tables, and sorting...
Read more >
CREAK Data Explorer - UT Computer Science
The film Pinocchio depicted a boy being swallowed by a large dinosaur. ... Poultry accounts for a higher percentage of total meat production...
Read more >
Pandas Rank – Rank Your Data – pd.df.rank()
Ranking Ascending True/False; Ranking with different methods; Ranking via pct; Ranking with Group By. But first, let's create our DataFrame. In ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found