DataFrame.describe(percentiles=[]) still returns 50% percentile.
See original GitHub issueThe DataFrame.describe() method docs seem to indicate that you can pass percentiles=None to not compute any percentiles, however by default it still computes 25%, 50% and 75%. The best I can do is pass an empty list to only compute the 50% percentile. I would think that passing an empty list would return no percentile computations.
Should we allow passing an empty list to not compute any percentiles?
pandas 0.17.1
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(10,5))
In [4]: df.describe(percentiles=None)
Out[4]:
0 1 2 3 4 5
count 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000
mean -0.116736 -0.160728 0.066763 -0.068867 -0.242050 0.390091
std 0.771704 0.837520 0.875747 0.955985 1.093919 0.923464
min -1.347786 -1.140541 -1.297533 -1.347824 -2.085290 -0.825807
25% -0.580527 -0.613640 -0.558291 -0.538433 -0.836046 -0.275567
50% -0.261526 -0.395307 0.007595 -0.248025 0.000515 0.314278
75% 0.329780 0.154053 0.708768 0.407732 0.366278 1.192338
max 1.285276 1.649528 1.485076 1.697162 1.551388 1.762939
In [15]: df.describe(percentiles=[])
Out[15]:
0 1 2 3 4 5
count 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000
mean -0.116736 -0.160728 0.066763 -0.068867 -0.242050 0.390091
std 0.771704 0.837520 0.875747 0.955985 1.093919 0.923464
min -1.347786 -1.140541 -1.297533 -1.347824 -2.085290 -0.825807
50% -0.261526 -0.395307 0.007595 -0.248025 0.000515 0.314278
max 1.285276 1.649528 1.485076 1.697162 1.551388 1.762939
Issue Analytics
- State:
- Created 8 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
Pandas - pd.DataFrame.describe() - Data Independent
The percentiles of your data: 25%, 50%, 75% by default. Pseudo Code: With your Series or DataFrame, return a Series that tell us...
Read more >pandas.DataFrame.describe — pandas 0.20.2 documentation
The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75] , which returns...
Read more >Optimal way to acquire percentiles of DataFrame rows
You can get use .describe() function like this: # Create Datarame df = pd.DataFrame(np.random.randn(5,3)) # .apply() the .describe() ...
Read more >Pandas Describe, Explained - Sharp Sight
Notice that the median (50th percentile) is still included. Also, notice that when we use this parameter, we need to present the percentiles...
Read more >Pandas DataFrame | describe method with Examples
Pandas DataFrame.describe(~) method returns a DataFrame containing some descriptive ... Notice how the 50% percentile is still there - this is because it ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes block level computation would be great! 👍
The other point I’m making is: Should we have an escape hatch in df.describe() for users that don’t want to compute medians for 1000’s of columns? Even with block level computation the median computation takes several times longer than all the other statistics combined. 🐢
If the empty list always computes the 50th percentile, how about a documentation update indicating this is expected behavior.