question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.describe(percentiles=[]) still returns 50% percentile.

See original GitHub issue

The DataFrame.describe() method docs seem to indicate that you can pass percentiles=None to not compute any percentiles, however by default it still computes 25%, 50% and 75%. The best I can do is pass an empty list to only compute the 50% percentile. I would think that passing an empty list would return no percentile computations.

Should we allow passing an empty list to not compute any percentiles?

pandas 0.17.1

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.randn(10,5))

In [4]: df.describe(percentiles=None)
Out[4]:
               0          1          2          3          4          5  
count  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
mean   -0.116736  -0.160728   0.066763  -0.068867  -0.242050   0.390091
std     0.771704   0.837520   0.875747   0.955985   1.093919   0.923464
min    -1.347786  -1.140541  -1.297533  -1.347824  -2.085290  -0.825807
25%    -0.580527  -0.613640  -0.558291  -0.538433  -0.836046  -0.275567
50%    -0.261526  -0.395307   0.007595  -0.248025   0.000515   0.314278
75%     0.329780   0.154053   0.708768   0.407732   0.366278   1.192338
max     1.285276   1.649528   1.485076   1.697162   1.551388   1.762939

In [15]: df.describe(percentiles=[])
Out[15]:
               0          1          2          3          4          5  
count  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
mean   -0.116736  -0.160728   0.066763  -0.068867  -0.242050   0.390091
std     0.771704   0.837520   0.875747   0.955985   1.093919   0.923464
min    -1.347786  -1.140541  -1.297533  -1.347824  -2.085290  -0.825807
50%    -0.261526  -0.395307   0.007595  -0.248025   0.000515   0.314278
max     1.285276   1.649528   1.485076   1.697162   1.551388   1.762939

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
dragoljubcommented, Dec 19, 2015

Yes block level computation would be great! 👍

The other point I’m making is: Should we have an escape hatch in df.describe() for users that don’t want to compute medians for 1000’s of columns? Even with block level computation the median computation takes several times longer than all the other statistics combined. 🐢

0reactions
RhysUcommented, Feb 25, 2019

If the empty list always computes the 50th percentile, how about a documentation update indicating this is expected behavior.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas - pd.DataFrame.describe() - Data Independent
The percentiles of your data: 25%, 50%, 75% by default. Pseudo Code: With your Series or DataFrame, return a Series that tell us...
Read more >
pandas.DataFrame.describe — pandas 0.20.2 documentation
The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75] , which returns...
Read more >
Optimal way to acquire percentiles of DataFrame rows
You can get use .describe() function like this: # Create Datarame df = pd.DataFrame(np.random.randn(5,3)) # .apply() the .describe() ...
Read more >
Pandas Describe, Explained - Sharp Sight
Notice that the median (50th percentile) is still included. Also, notice that when we use this parameter, we need to present the percentiles...
Read more >
Pandas DataFrame | describe method with Examples
Pandas DataFrame.describe(~) method returns a DataFrame containing some descriptive ... Notice how the 50% percentile is still there - this is because it ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found