DataFrame.groupby().sum() treating Nan as 0.0
See original GitHub issueCode Sample, a copy-pastable example if possible
In [62]: import pandas as pd
In [63]: import numpy as np
In [64]: df = pd.DataFrame(data=[['data1', 2, np.nan], ['data2', 3, 4], ['data3', 4, 4]], index=[1, 2, 3], columns=['a', 'b', 'c'])
In [68]: df
Out[68]:
a b c
1 data1 2 NaN
2 data2 3 4.0
3 data3 4 4.0
In [65]: df.groupby(by=['a','b']).sum(skipna=False)
Out[65]:
c
a b
data1 2 0.0
data2 3 4.0
data3 4 4.0
Problem description
The Nan value is being treated as 0.0. Is there an option to treat Nan as Nan and sum() to return Nan?
Expected Output
c
a b
data1 2 NaN
data2 3 4.0
data3 4 4.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.36.3.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.22.0 pytest: 3.5.0 pip: 9.0.3 setuptools: 39.0.1 Cython: 0.28.2 numpy: 1.14.2 scipy: 1.0.1 pyarrow: 0.9.0 xarray: 0.10.2 IPython: 5.6.0 sphinx: 1.7.2 patsy: 0.5.0 dateutil: 2.7.2 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.2.2 openpyxl: 2.5.2 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.2.1 bs4: 4.3.2 html5lib: 0.999 sqlalchemy: 1.2.6 pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:12
- Comments:15 (7 by maintainers)

Top Related StackOverflow Question
I’m using latest v1.0.1 but still see this issue. Also the
min_count=1argument seems to not work (for timedeltas at least). Any suggestions on how to keep thenanin agroupy().sum()?I think you want
min_count: