Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.describe(percentiles=[]) still returns 50% percentile.

See original GitHub issue

The DataFrame.describe() method docs seem to indicate that you can pass percentiles=None to not compute any percentiles, however by default it still computes 25%, 50% and 75%. The best I can do is pass an empty list to only compute the 50% percentile. I would think that passing an empty list would return no percentile computations.

Should we allow passing an empty list to not compute any percentiles?

pandas 0.17.1

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.randn(10,5))

In [4]: df.describe(percentiles=None)
Out[4]:
               0          1          2          3          4          5  
count  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
mean   -0.116736  -0.160728   0.066763  -0.068867  -0.242050   0.390091
std     0.771704   0.837520   0.875747   0.955985   1.093919   0.923464
min    -1.347786  -1.140541  -1.297533  -1.347824  -2.085290  -0.825807
25%    -0.580527  -0.613640  -0.558291  -0.538433  -0.836046  -0.275567
50%    -0.261526  -0.395307   0.007595  -0.248025   0.000515   0.314278
75%     0.329780   0.154053   0.708768   0.407732   0.366278   1.192338
max     1.285276   1.649528   1.485076   1.697162   1.551388   1.762939

In [15]: df.describe(percentiles=[])
Out[15]:
               0          1          2          3          4          5  
count  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
mean   -0.116736  -0.160728   0.066763  -0.068867  -0.242050   0.390091
std     0.771704   0.837520   0.875747   0.955985   1.093919   0.923464
min    -1.347786  -1.140541  -1.297533  -1.347824  -2.085290  -0.825807
50%    -0.261526  -0.395307   0.007595  -0.248025   0.000515   0.314278
max     1.285276   1.649528   1.485076   1.697162   1.551388   1.762939

Issue Analytics

State:
Created 8 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

dragoljubcommented, Dec 19, 2015

Yes block level computation would be great! 👍

The other point I’m making is: Should we have an escape hatch in df.describe() for users that don’t want to compute medians for 1000’s of columns? Even with block level computation the median computation takes several times longer than all the other statistics combined. 🐢

0reactions

RhysUcommented, Feb 25, 2019

If the empty list always computes the 50th percentile, how about a documentation update indicating this is expected behavior.

Top Results From Across the Web

Pandas - pd.DataFrame.describe() - Data Independent

The percentiles of your data: 25%, 50%, 75% by default. Pseudo Code: With your Series or DataFrame, return a Series that tell us...

pandas.DataFrame.describe — pandas 0.20.2 documentation

The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75] , which returns...

Optimal way to acquire percentiles of DataFrame rows

You can get use .describe() function like this: # Create Datarame df = pd.DataFrame(np.random.randn(5,3)) # .apply() the .describe() ...

Pandas Describe, Explained - Sharp Sight

Notice that the median (50th percentile) is still included. Also, notice that when we use this parameter, we need to present the percentiles...

Pandas DataFrame | describe method with Examples

Pandas DataFrame.describe(~) method returns a DataFrame containing some descriptive ... Notice how the 50% percentile is still there - this is because it ......