question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Include missing data count in pd.Dataframe.describe method

See original GitHub issue

Code Sample

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.describe()

Problem description

  • Output image

The describe method generally only include 9 summary statistics(count, mean, std, min, 25%, 50%, 75%, max, missing) but no missing count which is very import in realworld data analysis.

To include missing count I have to use the following code,

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
des1 = df.describe()
des2 = df.isnull().sum().to_frame(name = 'missing').T
pd.concat([des1, des2])

And the output image

Expected Output

Expect include missing count in describe method.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:12
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
petesherickcommented, Aug 27, 2019

Agree, this is default behavior of R’s summary(df) function for obvious reasons. More useful than sd anyway.

1reaction
jrebackcommented, Nov 18, 2018

count is the non missing length so i guess you could add length (or size) is what we call it

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas extensive 'describe' include count the null values
First row of final df is count - used function count for count non NaNs values df = pd.DataFrame({ 'A':list('abcdef'), 'B':[4,np.nan,np.nan ...
Read more >
pandas.DataFrame.describe — pandas 1.5.2 documentation
Subset of a DataFrame including/excluding columns based on their dtype. Notes. For numeric data, the result's index will include count , mean ,...
Read more >
Count NaN or missing values in Pandas DataFrame
Dataframe.isnull() method ... Pandas isnull() function detect missing values in the given object. It return a boolean same-sized object indicating ...
Read more >
Detect and count missing values (NaN) with isnull(), isna()
This article describes how to check if pandas.DataFrame and Series contain missing values and count the number of missing and non-missing ...
Read more >
Pandas DataFrame: describe() function - w3resource
DataFrame - describe() function · A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found