Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Include missing data count in pd.Dataframe.describe method

See original GitHub issue

Code Sample

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.describe()

Problem description

Output

The describe method generally only include 9 summary statistics(count, mean, std, min, 25%, 50%, 75%, max, missing) but no missing count which is very import in realworld data analysis.

To include missing count I have to use the following code,

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
des1 = df.describe()
des2 = df.isnull().sum().to_frame(name = 'missing').T
pd.concat([des1, des2])

And the output

Expected Output

Expect include missing count in describe method.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Reactions:12
Comments:7 (3 by maintainers)

Top GitHub Comments

3reactions

petesherickcommented, Aug 27, 2019

Agree, this is default behavior of R’s summary(df) function for obvious reasons. More useful than sd anyway.

1reaction

jrebackcommented, Nov 18, 2018

count is the non missing length so i guess you could add length (or size) is what we call it

Top Results From Across the Web

Pandas extensive 'describe' include count the null values

First row of final df is count - used function count for count non NaNs values df = pd.DataFrame({ 'A':list('abcdef'), 'B':[4,np.nan,np.nan ...

pandas.DataFrame.describe — pandas 1.5.2 documentation

Subset of a DataFrame including/excluding columns based on their dtype. Notes. For numeric data, the result's index will include count , mean ,...

Count NaN or missing values in Pandas DataFrame

Dataframe.isnull() method ... Pandas isnull() function detect missing values in the given object. It return a boolean same-sized object indicating ...

Detect and count missing values (NaN) with isnull(), isna()

This article describes how to check if pandas.DataFrame and Series contain missing values and count the number of missing and non-missing ...

Pandas DataFrame: describe() function - w3resource

DataFrame - describe() function · A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit...