Include missing data count in pd.Dataframe.describe method
See original GitHub issueCode Sample
d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.describe()
Problem description
- Output
The describe method generally only include 9 summary statistics(count, mean, std, min, 25%, 50%, 75%, max, missing) but no missing count which is very import in realworld data analysis.
To include missing count I have to use the following code,
d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
des1 = df.describe()
des2 = df.isnull().sum().to_frame(name = 'missing').T
pd.concat([des1, des2])
And the output
Expected Output
Expect include missing count in describe method.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:12
- Comments:7 (3 by maintainers)
Top GitHub Comments
Agree, this is default behavior of R’s summary(df) function for obvious reasons. More useful than sd anyway.
count is the non missing length so i guess you could add length (or size) is what we call it