Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: pivot/groupby index with nan

See original GitHub issue

ENH: maybe for now just provide a warning if dropping the nan rows when pivotting…

rom ml

http://stackoverflow.com/questions/16860172/python-pandas-pivot-table-silently-drops-indices-with-nans

This is effectivly trying to groupby on a NaN, currently not allowed

In [13]: a = [['a', 'b', 12, 12, 12], ['a', nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]]

In [14]: df = DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'])

In [15]: df.groupby(['a','b']).sum()
Out[15]: 
          c    d   e
a b                 
a b   13.00   13  13
b a  123.23  123   1

Workaround to fill the index with a dummy, pivot, and replace


    In [31]: df2 = df.copy()

    In [32]: df2['dummy'] = np.nan

    In [33]: df2['b'] = df2['b'].fillna('dummy')

    In [34]: df2
    Out[34]: 
       a      b       c    d   e  dummy
    0  a      b   12.00   12  12    NaN
    1  a  dummy   12.30  233  12    NaN
    2  b      a  123.23  123   1    NaN
    3  a      b    1.00    1   1    NaN

    In [35]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum)
    Out[35]: 
       a      b       c    d   e
    0  a      b   13.00   13  13
    1  a  dummy   12.30  233  12
    2  b      a  123.23  123   1

    In [36]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).replace('dummy',np.nan)
    Out[36]: 
       a    b       c    d   e
    0  a    b   13.00   13  13
    1  a  NaN   12.30  233  12
    2  b    a  123.23  123   1

Issue Analytics

State:
Created 10 years ago
Reactions:69
Comments:54 (17 by maintainers)

Top GitHub Comments

21reactions

haydcommented, Aug 25, 2013

adding dropna=True to groupby seems reasonable.

Behaviour currently in docs as “this is how R works”, but doesn’t really say why…

15reactions

ottothecowcommented, Feb 6, 2018

There really needs to be an option to parse the NA/missing group.

SAS and Stata both offer “missing” options when making crosstabs or summarizing by group variables, and at least for me, I use them more often than not.

Simply saying “R does it” seems like odd logic. Silently dropping missing categories in a groupby is dangerous.