"Function does not reduce" for multiindex, but works fine for single index.
See original GitHub issueWhen I use pd.Series.tolist as a reducer with a single column groupby, it works. When I do the same with multiindex, it does not.
It seems the “fast” cython groupby function, which has no quarrel with reducing into lists, throws an exception if the index is “complex”, which seem to mean multiindex. When that exception is caught, the groupby function falls back to the “pure_python” groupby, which throws a new exception if the reducing function returns a list.
Is this a bug or is there some logic to this which is not apparent to me?
Reproduce:
import pandas as pd
s1 = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
df = pd.DataFrame([s1], columns=['a', 'b', 'c', 'd', 'e'])
for i in range(0,10):
s1 = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
df2 = pd.DataFrame([s1], columns=['a', 'b', 'c', 'd', 'e'])
df = pd.concat([df, df2])
df['gk'] = 'foo'
df['gk2'] = 'bar'
# This works.
df.groupby(['gk']).agg(pd.Series.tolist)
# This does not.
df.groupby(['gk', 'gk2']).agg(pd.Series.tolist)
Issue Analytics
- State:
- Created 10 years ago
- Comments:17 (9 by maintainers)
Top Results From Across the Web
ValueError: Function does not reduce - for multiindex, but ...
I get an error saying ValueError: Function does not reduce . Why is it that? ... Try upgrade pandas version, in pandas 0.23.1...
Read more >MultiIndex / advanced indexing — pandas 1.5.2 documentation
The MultiIndex object is the hierarchical analogue of the standard Index object ... Indexing will work even if the data are not sorted,...
Read more >Functions That Generate a Multi-index in Pandas and How to ...
In this article, we will look at what a multiindex is, where and when to use it, functions that generate a multiindex, and...
Read more >How to do groupby on a multiindex in Pandas? - GeeksforGeeks
As there is no indexing in the DataFrame, we can say this DataFrame has no index. ... which is a great way to...
Read more >Hierarchical Indexing | Python Data Science Handbook
Our tuple-based indexing is essentially a rudimentary multi-index, and the Pandas ... in Operating on Data in Pandas work with hierarchical indices as...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
how does using
tolist
save you any data? its the same data just in a list and comparisons are then hardI think you can one of these:
df.groupby(keys).apply(lambda x: x._get_numeric_data().abs()sum())
or another function that effectively hashes a row togetherdf.groupby(['gk','gk2']).agg(lambda x: tuple(x.tolist()))
will do what you want with the multi-indexes (or single index); as a tuple it is inferred as a reductionYes, that’s the output I want. Assuming I sort by [‘e’,‘a’] first, this is probably faster, too. It is still very convenient to be able to create list values.
Looking at the history of this restriction, it looks like it was accidentally introduced by a transcription error in f3c0a081e2cfc8e073f8461cac5c242d0e4219d0 - at the time, it was an assertion, and it went from
where, as far as I can tell,
dummy
is uninitialized, toThe original assertion was added without comment in 71e9046c52246535d4db1f350e82c3a84d748f88, in response to #612 “Pure python multi-key groupby can’t handle non-numeric results”. Which reveals another oddity: groupby().agg(pd.Series.tolist) works fine for single-key groupings; it only fails for multi-key groupings.