question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: DataFrame.agg - why numpy.size doesn't work?

See original GitHub issue
  • [ x] I have checked that this issue has not already been reported.

  • [ x] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],
                  columns=['A', 'B', 'C'])

df.agg({'A':['mean','std','size']})

import numpy as np
#Somehow this just doesn't work with DF.agg but works with DFGroupby.agg
df.agg({'A':[np.mean,np.std,np.size]})

Problem description

Intuitively, I assumed df.agg({‘A’:[np.mean,np.std,np.size]}) should work as df.agg({‘A’:[‘mean’,‘std’,‘size’]}) does, but it doesn’t. I wonder why? Looked through docs like the below but still didn’t get it:

Expected Output

<html> <body>

A

4.0 3.0 4.0

</body> </html> ####

Output of *df.agg({'A':[np.mean,np.std,np.size]})


TypeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate_multiple_funcs(self, arg, _axis) 553 try: –> 554 return concat(results, keys=keys, axis=1, sort=False) 555 except TypeError:

~\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 280 copy=copy, –> 281 sort=sort, 282 )

~\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort) 356 ) –> 357 raise TypeError(msg) 358

TypeError: cannot concatenate object of type ‘<class ‘float’>’; only Series and DataFrame objs are valid

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) <ipython-input-39-051b5cf01f85> in <module> 1 import numpy as np ----> 2 df.agg({‘A’:[np.mean,np.std,np.size]})

~\anaconda3\lib\site-packages\pandas\core\frame.py in aggregate(self, func, axis, *args, **kwargs) 6704 result = None 6705 try: -> 6706 result, how = self._aggregate(func, axis=axis, *args, **kwargs) 6707 except TypeError: 6708 pass

~\anaconda3\lib\site-packages\pandas\core\frame.py in _aggregate(self, arg, axis, *args, **kwargs) 6718 result = result.T if result is not None else result 6719 return result, how -> 6720 return super()._aggregate(arg, *args, **kwargs) 6721 6722 agg = aggregate

~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs) 426 427 try: –> 428 result = _agg(arg, _agg_1dim) 429 except SpecificationError: 430

~\anaconda3\lib\site-packages\pandas\core\base.py in _agg(arg, func) 393 result = {} 394 for fname, agg_how in arg.items(): –> 395 result[fname] = func(fname, agg_how) 396 return result 397

~\anaconda3\lib\site-packages\pandas\core\base.py in _agg_1dim(name, how, subset) 377 “nested dictionary is ambiguous in aggregation” 378 ) –> 379 return colg.aggregate(how) 380 381 def _agg_2dim(name, how):

~\anaconda3\lib\site-packages\pandas\core\series.py in aggregate(self, func, axis, *args, **kwargs) 3686 # Validate the axis parameter 3687 self._get_axis_number(axis) -> 3688 result, how = self._aggregate(func, *args, **kwargs) 3689 if result is None: 3690

~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs) 484 elif is_list_like(arg): 485 # we require a list, but not an ‘str’ –> 486 return self._aggregate_multiple_funcs(arg, _axis=_axis), None 487 else: 488 result = None

~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate_multiple_funcs(self, arg, _axis) 562 result = Series(results, index=keys, name=self.name) 563 if is_nested_object(result): –> 564 raise ValueError(“cannot combine transform and aggregation operations”) 565 return result 566

ValueError: cannot combine transform and aggregation operations

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
attack68commented, Jun 24, 2021

Actually this works:

df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],
                  columns=['A', 'B', 'C'])

df.agg({'A':['mean','std']})
df.agg({'A':[np.mean,np.std]})

The only thing relevant to your issue is:

df.agg({'A':['size']})
df.agg({'A':[np.size]})
0reactions
kwhkimcommented, Aug 30, 2021

Wow, this looks serious. I have another example.

>>> df.agg(np.size)
A    3
B    3
C    3
dtype: int64
>>> df.agg({'A':np.size})
   A
0  1
1  1
2  1

so df.agg({'A':}) is more like df.A.agg()?

>>> df.A.agg(np.size)
0    1
1    1
2    1
Name: A, dtype: int64

It gets weirder

>>> df.groupby([1]*len(df)).agg({'A':np.size})  # [1]*len(df) makes the whole rows as group 1
   A
1  3
Read more comments on GitHub >

github_iconTop Results From Across the Web

BUG: DataFrame.agg({'col': 'size'}) not working #16405 - GitHub
size aggregation does not seem to work properly when used with ungrouped dataframes. When replacing size with count the examples run through ...
Read more >
pandas.DataFrame.agg does not work with np.std?
I am trying to use the pandas. DataFrame. agg function on the first column of a dataframe with the agg function is numpy....
Read more >
How to Fix: Length of values does not match length of index
This error occurs when you attempt to assign a NumPy array of values to a new column in a pandas DataFrame, yet the...
Read more >
What's new in 1.5.0 (September 19, 2022) - Pandas
StringArray now accepts array-likes containing nan-likes ( None , np.nan ) for the values parameter in its constructor in addition to strings and...
Read more >
How to Fix: Length of values does not match length of index
This error can be fixed by preprocessing the new list or NumPy array that is going to be a column of the DataFrame...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found