ENH: pivot/groupby index with nan
See original GitHub issueENH: maybe for now just provide a warning if dropping the nan rows when pivotting…
rom ml
This is effectivly trying to groupby on a NaN, currently not allowed
In [13]: a = [['a', 'b', 12, 12, 12], ['a', nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]]
In [14]: df = DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'])
In [15]: df.groupby(['a','b']).sum()
Out[15]:
c d e
a b
a b 13.00 13 13
b a 123.23 123 1
Workaround to fill the index with a dummy, pivot, and replace
In [31]: df2 = df.copy()
In [32]: df2['dummy'] = np.nan
In [33]: df2['b'] = df2['b'].fillna('dummy')
In [34]: df2
Out[34]:
a b c d e dummy
0 a b 12.00 12 12 NaN
1 a dummy 12.30 233 12 NaN
2 b a 123.23 123 1 NaN
3 a b 1.00 1 1 NaN
In [35]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum)
Out[35]:
a b c d e
0 a b 13.00 13 13
1 a dummy 12.30 233 12
2 b a 123.23 123 1
In [36]: df2.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).replace('dummy',np.nan)
Out[36]:
a b c d e
0 a b 13.00 13 13
1 a NaN 12.30 233 12
2 b a 123.23 123 1
Issue Analytics
- State:
- Created 10 years ago
- Reactions:69
- Comments:54 (17 by maintainers)
Top Results From Across the Web
python pandas: pivot_table silently drops indices with nans
I think silently dropping these rows from the pivot will at some point cause someone serious pain. import pandas import numpy a =...
Read more >Reshaping and pivot tables — pandas 1.5.2 documentation
Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as...
Read more >WORKSHEET – Data Handling Using Pandas
Write Python Program to create a Pivot Table with State as the index, Sales as the values and calculating the maximum Sales in...
Read more >Pivot node is slow - Feature enhancement - KNIME Forum
Where index is a list of columns to group on, columns is the column that contains the new column headers and values is...
Read more >Pandas cheat sheet - Code pills
Series(['a', 'b', 'c'], index=[1, 2, 9]) # ok (fills with NaN and 'c' is ... Indexing | Selection | Timeseries | Clean |...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
adding
dropna=True
to groupby seems reasonable.Behaviour currently in docs as “this is how R works”, but doesn’t really say why…
There really needs to be an option to parse the NA/missing group.
SAS and Stata both offer “missing” options when making crosstabs or summarizing by group variables, and at least for me, I use them more often than not.
Simply saying “R does it” seems like odd logic. Silently dropping missing categories in a groupby is dangerous.