question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sum of pd.DataFrame.groupby.sum containing NaN should return NaN ?

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
d = {'l':  ['left', 'right', 'left', 'right', 'left', 'right'],
     'r': ['right', 'left', 'right', 'left', 'right', 'left'],
     'v': [-1, 1, -1, 1, -1, np.nan]}
df = pd.DataFrame(d)

Problem description

When a grouped dataframe contains a value of np.NaN the expected output is not aligned with numpy.sum or pandas.Series.sum

NaN as is given by the skipna=False flag for pd.Series.sum and also pd.DataFrame.sum

In [235]: df.v.sum(skipna=False)
Out[235]: nan

However, this behavior is not reflected in the pandas.DataFrame.groupby object

In [237]: df.groupby('l')['v'].sum()['right']
Out[237]: 2.0

and cannot be forced by applying the np.sum method directly

In [238]: df.groupby('l')['v'].apply(np.sum)['right']
Out[238]: 2.0

see this StackOverflow post for a workaround

Expected Output

In [238]: df.groupby('l')['v'].apply(np.sum)['right']
Out[238]: nan

and

In [237]: df.groupby('l')['v'].sum(skipna=False)['right']
Out[237]: nan

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1 nose: 1.3.7 pip: 9.0.1 setuptools: 32.3.1 Cython: 0.25.2 numpy: 1.12.0 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.4 lxml: 3.7.0 bs4: 4.5.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.4 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.43.0 pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
jorisvandenbosschecommented, Mar 13, 2017

you really think so?

Jeff, there are certainly cases imaginable where you don’t want to ignore missing values. And therefore we have that as a keyword.

0reactions
ajosanchezcommented, Mar 14, 2018

It would be nice to have a keyword and get those NAs back in this case: groupby([‘x’]).resample(‘D’).sum()

When I resample after a groupby i need and aggregating function. it would be nice not to get zeros when I increase the resolution so I can use .fillna(method=‘ffill’).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Summing rows in grouped pandas dataframe and return NaN
I think it's inherent to pandas. A workaround can be : df.groupby('l')['v'].apply(array).apply(sum). to mimic the numpy way,.
Read more >
DataFrame.groupby().sum() treating Nan as 0.0 #20824 - GitHub
Problem description. The Nan value is being treated as 0.0. Is there an option to treat Nan as Nan and sum() to return...
Read more >
Working with missing data — pandas 1.5.2 documentation
Because NaN is a float, a column of integers with even one missing values is ... previously sum/prod of all-NA or empty Series/DataFrames...
Read more >
How to sum multiple columns with nan values in Pandas?
In today's quick tutorial we'll learn how to sum columns in Pandas DataFrames ... We will replace the NAN missing values with zeros...
Read more >
Pandas groupby() and sum() With Examples
Use DataFrame.groupby().sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found