question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Aggregating over arrays

See original GitHub issue

This is frowned upon behaviour (storing arrays inside DataFrames) but is there a reason for this raise?

Deleting the raising lines seems to only break tests to check that they’re raising…

df = pd.DataFrame([[1,np.array([10,20,30])],
               [1,np.array([40,50,60])], 
               [2,np.array([20,30,40])],], columns=['category','arraydata'])
g = df.groupby('category')
g.agg(sum)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-34-527a2010b455> in <module>()
----> 1 g.agg(sum)

/Users/andy/pandas/pandas/core/groupby.py in agg(self, func, *args, **kwargs)
    337     @Appender(_agg_doc)
    338     def agg(self, func, *args, **kwargs):
--> 339         return self.aggregate(func, *args, **kwargs)
    340
    341     def _iterate_slices(self):

/Users/andy/pandas/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   1740             cyfunc = _intercept_cython(arg)
   1741             if cyfunc and not args and not kwargs:
-> 1742                 return getattr(self, cyfunc)()
   1743
   1744             if self.grouper.nkeys > 1:

/Users/andy/pandas/pandas/core/groupby.py in f(self)
     62             raise SpecificationError(str(e))
     63         except Exception:
---> 64             result = self.aggregate(lambda x: npfunc(x, axis=self.axis))
     65             if _convert:
     66                 result = result.convert_objects()

/Users/andy/pandas/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   1745                 return self._python_agg_general(arg, *args, **kwargs)
   1746             else:
-> 1747                 result = self._aggregate_generic(arg, *args, **kwargs)
   1748
   1749         if not self.as_index:

/Users/andy/pandas/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
   1803                     result[name] = self._try_cast(func(data, *args, **kwargs),data)
   1804             except Exception:
-> 1805                 return self._aggregate_item_by_item(func, *args, **kwargs)
   1806         else:
   1807             for name in self.indices:

/Users/andy/pandas/pandas/core/groupby.py in _aggregate_item_by_item(self, func, *args, **kwargs)
   1828                 colg = SeriesGroupBy(obj[item], selection=item,
   1829                                      grouper=self.grouper)
-> 1830                 result[item] = colg.aggregate(func, *args, **kwargs)
   1831             except ValueError:
   1832                 cannot_agg.append(item)

/Users/andy/pandas/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   1425                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
   1426             except Exception:
-> 1427                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
   1428
   1429             index = Index(sorted(result), name=self.grouper.names[0])

/Users/andy/pandas/pandas/core/groupby.py in _aggregate_named(self, func, *args, **kwargs)
   1509             output = func(group, *args, **kwargs)
   1510             if isinstance(output, np.ndarray):
-> 1511                 raise Exception('Must produce aggregated value')
   1512             result[name] = self._try_cast(output, group)
   1513

Exception: Must produce aggregated value

http://stackoverflow.com/questions/16975318/pandas-aggregate-when-column-contains-numpy-arrays

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

5reactions
leezucommented, Aug 23, 2017

This seems only partially fixed. Taking the example from the test case

        df = pd.DataFrame([[1, np.array([10, 20, 30])],
                           [1, np.array([40, 50, 60])],
                           [2, np.array([20, 30, 40])]],
                          columns=['category', 'arraydata'])

The following will work (thats the test case): result = df.groupby('category').agg(sum)

But this will fail: result = df.groupby('category')["arraydata"].agg(sum)

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in f(self, **kwargs)
   1153                 except Exception:
   1154                     result = self.aggregate(
-> 1155                         lambda x: npfunc(x, axis=self.axis))
   1156                     if _convert:
   1157                         result = result._convert(datetime=True)

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2883                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
   2884             except Exception:
-> 2885                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
   2886 
   2887             index = Index(sorted(result), name=self.grouper.names[0])

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in _aggregate_named(self, func, *args, **kwargs)
   3015             output = func(group, *args, **kwargs)
   3016             if isinstance(output, (Series, Index, np.ndarray)):
-> 3017                 raise Exception('Must produce aggregated value')
   3018             result[name] = self._try_cast(output, group)
   3019 

Exception: Must produce aggregated value

Or in a similar case:

----> 1 g["mean"].agg(lambda x: np.mean(x))

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2878 
   2879             if self.grouper.nkeys > 1:
-> 2880                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
   2881 
   2882             try:

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs)
    846         for name, obj in self._iterate_slices():
    847             try:
--> 848                 result, counts = self.grouper.agg_series(obj, f)
    849                 output[name] = self._try_cast(result, obj, numeric_only=True)
    850             except TypeError:

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in agg_series(self, obj, func)
   2178             return self._aggregate_series_fast(obj, func)
   2179         except Exception:
-> 2180             return self._aggregate_series_pure_python(obj, func)
   2181 
   2182     def _aggregate_series_fast(self, obj, func):

/home/data/lelausen/.local/lib/python3.6/site-packages/pandas/core/groupby.py in _aggregate_series_pure_python(self, obj, func)
   2213                 if (isinstance(res, (Series, Index, np.ndarray)) or
   2214                         isinstance(res, list)):
-> 2215                     raise ValueError('Function does not reduce')
   2216                 result = np.empty(ngroups, dtype='O')
   2217 

ValueError: Function does not reduce

1reaction
haydcommented, Jun 7, 2013

It seems no less well-defined than passing lists.

Agreed that you lose all the fast operations, but if that’s the only reason shouldn’t this be left up to the user…?

Read more comments on GitHub >

github_iconTop Results From Across the Web

ARRAY_AGG aggregate function - IBM
The ARRAY_AGG function aggregates a set of elements into an array. Invocation of the ARRAY_AGG aggregate function is based on the result array...
Read more >
Aggregate functions over arrays - postgresql - Stack Overflow
Try something like this: SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num FROM your_table. This SELECT will give you...
Read more >
Using aggregation functions with arrays - Amazon Athena
To aggregate multiple rows within an array, use array_agg . For information, see Creating arrays from subqueries. Note.
Read more >
Documentation: 9.5: Aggregate Functions - PostgreSQL
Aggregate Functions. Aggregate functions compute a single result from a set of input values. The built-in normal aggregate functions are listed in Table ......
Read more >
MongoDB Aggregation Framework - Working With Arrays
Aggregation Framework has operators to work with array fields of a MongoDB collection document. There are two categories of array operators:  ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found