question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: pivot/groupby index with nan

See original GitHub issue

ENH: maybe for now just provide a warning if dropping the nan rows when pivotting…

rom ml

http://stackoverflow.com/questions/16860172/python-pandas-pivot-table-silently-drops-indices-with-nans

This is effectivly trying to groupby on a NaN, currently not allowed

In [13]: a = [['a', 'b', 12, 12, 12], ['a', nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]]

In [14]: df = DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'])

In [15]: df.groupby(['a','b']).sum()
Out[15]: 
          c    d   e
a b                 
a b   13.00   13  13
b a  123.23  123   1

Workaround to fill the index with a dummy, pivot, and replace


    In [31]: df2 = df.copy()

    In [32]: df2['dummy'] = np.nan

    In [33]: df2['b'] = df2['b'].fillna('dummy')

    In [34]: df2
    Out[34]: 
       a      b       c    d   e  dummy
    0  a      b   12.00   12  12    NaN
    1  a  dummy   12.30  233  12    NaN
    2  b      a  123.23  123   1    NaN
    3  a      b    1.00    1   1    NaN

    In [35]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum)
    Out[35]: 
       a      b       c    d   e
    0  a      b   13.00   13  13
    1  a  dummy   12.30  233  12
    2  b      a  123.23  123   1

    In [36]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).replace('dummy',np.nan)
    Out[36]: 
       a    b       c    d   e
    0  a    b   13.00   13  13
    1  a  NaN   12.30  233  12
    2  b    a  123.23  123   1

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Reactions:69
  • Comments:54 (17 by maintainers)

github_iconTop GitHub Comments

21reactions
haydcommented, Aug 25, 2013

adding dropna=True to groupby seems reasonable.

Behaviour currently in docs as “this is how R works”, but doesn’t really say why…

15reactions
ottothecowcommented, Feb 6, 2018

There really needs to be an option to parse the NA/missing group.

SAS and Stata both offer “missing” options when making crosstabs or summarizing by group variables, and at least for me, I use them more often than not.

Simply saying “R does it” seems like odd logic. Silently dropping missing categories in a groupby is dangerous.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python pandas: pivot_table silently drops indices with nans
I think silently dropping these rows from the pivot will at some point cause someone serious pain. import pandas import numpy a =...
Read more >
Reshaping and pivot tables — pandas 1.5.2 documentation
Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as...
Read more >
WORKSHEET – Data Handling Using Pandas
Write Python Program to create a Pivot Table with State as the index, Sales as the values and calculating the maximum Sales in...
Read more >
Pivot node is slow - Feature enhancement - KNIME Forum
Where index is a list of columns to group on, columns is the column that contains the new column headers and values is...
Read more >
Pandas cheat sheet - Code pills
Series(['a', 'b', 'c'], index=[1, 2, 9]) # ok (fills with NaN and 'c' is ... Indexing | Selection | Timeseries | Clean |...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found