question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Function does not reduce" for multiindex, but works fine for single index.

See original GitHub issue

When I use pd.Series.tolist as a reducer with a single column groupby, it works. When I do the same with multiindex, it does not.

It seems the “fast” cython groupby function, which has no quarrel with reducing into lists, throws an exception if the index is “complex”, which seem to mean multiindex. When that exception is caught, the groupby function falls back to the “pure_python” groupby, which throws a new exception if the reducing function returns a list.

Is this a bug or is there some logic to this which is not apparent to me?

Reproduce:

import pandas as pd
s1 = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
df = pd.DataFrame([s1], columns=['a', 'b', 'c', 'd', 'e'])
for i in range(0,10):
    s1 = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
    df2 = pd.DataFrame([s1], columns=['a', 'b', 'c', 'd', 'e'])
    df = pd.concat([df, df2])
df['gk'] = 'foo'
df['gk2'] = 'bar'

# This works.
df.groupby(['gk']).agg(pd.Series.tolist)

# This does not.
df.groupby(['gk', 'gk2']).agg(pd.Series.tolist)

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:17 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
jrebackcommented, Jul 22, 2013

how does using tolist save you any data? its the same data just in a list and comparisons are then hard

I think you can one of these:

  • df.groupby(keys).apply(lambda x: x._get_numeric_data().abs()sum()) or another function that effectively hashes a row together
  • df.groupby(['gk','gk2']).agg(lambda x: tuple(x.tolist())) will do what you want with the multi-indexes (or single index); as a tuple it is inferred as a reduction
1reaction
nbateshauscommented, Jun 22, 2016

Yes, that’s the output I want. Assuming I sort by [‘e’,‘a’] first, this is probably faster, too. It is still very convenient to be able to create list values.

Looking at the history of this restriction, it looks like it was accidentally introduced by a transcription error in f3c0a081e2cfc8e073f8461cac5c242d0e4219d0 - at the time, it was an assertion, and it went from

assert(not (isinstance(res, list) and len(res) == len(self.dummy)))

where, as far as I can tell, dummy is uninitialized, to

assert(not isinstance(res, list))

The original assertion was added without comment in 71e9046c52246535d4db1f350e82c3a84d748f88, in response to #612 “Pure python multi-key groupby can’t handle non-numeric results”. Which reveals another oddity: groupby().agg(pd.Series.tolist) works fine for single-key groupings; it only fails for multi-key groupings.

>>> eav.groupby(['attributeName']).agg(pd.Series.tolist)
                      recordId  \
attributeName                    
author         [1, 1, 1, 1, 1]   
title                      [1]   

                                                           value  
attributeName                                                     
author         [Foto N. Afrati, Vinayak Borkar, Michael Carey...  
title              [Map-Reduce Extensions and Recursive Queries]  
>>> eav.groupby(['recordId', 'attributeName']).agg(pd.Series.tolist)
Traceback (most recent call last):
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 1863, in agg_series
    return self._aggregate_series_fast(obj, func)
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 1868, in _aggregate_series_fast
    func = self._is_builtin_func(func)
AttributeError: 'BaseGrouper' object has no attribute '_is_builtin_func'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 3597, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 3122, in aggregate
    return self._python_agg_general(arg, *args, **kwargs)
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 777, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 1865, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "/Users/nik/anaconda/envs/python3/lib/python3.5/site-packages/pandas/core/groupby.py", line 1899, in _aggregate_series_pure_python
    raise ValueError('Function does not reduce')
ValueError: Function does not reduce
Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: Function does not reduce - for multiindex, but ...
I get an error saying ValueError: Function does not reduce . Why is it that? ... Try upgrade pandas version, in pandas 0.23.1...
Read more >
MultiIndex / advanced indexing — pandas 1.5.2 documentation
The MultiIndex object is the hierarchical analogue of the standard Index object ... Indexing will work even if the data are not sorted,...
Read more >
Functions That Generate a Multi-index in Pandas and How to ...
In this article, we will look at what a multiindex is, where and when to use it, functions that generate a multiindex, and...
Read more >
How to do groupby on a multiindex in Pandas? - GeeksforGeeks
As there is no indexing in the DataFrame, we can say this DataFrame has no index. ... which is a great way to...
Read more >
Hierarchical Indexing | Python Data Science Handbook
Our tuple-based indexing is essentially a rudimentary multi-index, and the Pandas ... in Operating on Data in Pandas work with hierarchical indices as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found