question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Aggregate fails with mixed types in grouping series

See original GitHub issue

Code Sample, a copy-pastable example if possible

X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
X.groupby('grouping').aggregate(lambda x: x.tolist())

This is the exception and traceback that the code above returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   3482                     result = self._aggregate_multiple_funcs(
-> 3483                         [arg], _level=_level, _axis=self.axis)
   3484                     result.columns = Index(

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
    690         if not len(results):
--> 691             raise ValueError("no results")
    692 

ValueError: no results

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
   3508                 for name, data in self:
-> 3509                     result[name] = self._try_cast(func(data, *args, **kwargs),
   3510                                                   data)

<ipython-input-25-18b24604e98f> in <lambda>(x)
      2 X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
----> 3 X.groupby('grouping').aggregate(lambda x: x.tolist())

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name)
   3080                 return self[name]
-> 3081             return object.__getattribute__(self, name)
   3082 

AttributeError: 'DataFrame' object has no attribute 'tolist'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-25-18b24604e98f> in <module>()
      1 X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
      2 X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
----> 3 X.groupby('grouping').aggregate(lambda x: x.tolist())

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   4034         versionadded=''))
   4035     def aggregate(self, arg, *args, **kwargs):
-> 4036         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4037 
   4038     agg = aggregate

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   3486                         name=self._selected_obj.columns.name)
   3487                 except:
-> 3488                     result = self._aggregate_generic(arg, *args, **kwargs)
   3489 
   3490         if not self.as_index:

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
   3510                                                   data)
   3511             except Exception:
-> 3512                 return self._aggregate_item_by_item(func, *args, **kwargs)
   3513         else:
   3514             for name in self.indices:

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_item_by_item(self, func, *args, **kwargs)
   3554             # GH6337
   3555             if not len(result_columns) and errors is not None:
-> 3556                 raise errors
   3557 
   3558         return DataFrame(result, columns=result_columns)

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_item_by_item(self, func, *args, **kwargs)
   3539                                      grouper=self.grouper)
   3540                 result[item] = self._try_cast(
-> 3541                     colg.aggregate(func, *args, **kwargs), data)
   3542             except ValueError:
   3543                 cannot_agg.append(item)

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2885                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
   2886 
-> 2887             index = Index(sorted(result), name=self.grouper.names[0])
   2888             ret = Series(result, index=index)
   2889 

TypeError: unorderable types: str() < int()

Problem description

If a grouping vector is of mixed type and aggregate is used after groupby(…), an exception will be raised. The source code will get to this line and fails because sorted() does not support mixed types.

Expected Output

This is what we would expect to see if the exception was not raised. This output was achieved by using a column in groupby that is of a single type. In this instance, 2 was changed to a string

X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
X['grouping'] = ['group 1', 'group 1', 'group 1', '2', '2' , '2', 'group 1']
X.groupby('grouping').aggregate(lambda x: x.tolist())

                                                          X  \
grouping                                                      
2         [0.9219120799240533, 0.6439069401684864, 0.035...   
group 1   [0.6884732212797477, 0.326906484996646, 0.6718...   

                                                          Y  \
grouping                                                      
2         [0.7796923828539405, 0.7668459596180287, 0.868...   
group 1   [0.20259205506065203, 0.9138593138141587, 0.95...   

                                                          Z  
grouping                                                     
2         [0.9863526134877422, 0.6342347501171951, 0.873...  
group 1   [0.054465751087565906, 0.9026560581041934, 0.9...  

Output of pd.show_versions()

# Paste the output here pd.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

cc @ElDeveloper

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Jul 14, 2017

simpler example

In [20]: Index([0, '1']).sort_values()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-c2949e5a9d7f> in <module>()
----> 1 Index([0, '1']).sort_values()

/Users/jreback/pandas/pandas/core/indexes/base.py in sort_values(self, return_indexer, ascending)
   2026         Return sorted copy of Index
   2027         """
-> 2028         _as = self.argsort()
   2029         if not ascending:
   2030             _as = _as[::-1]

/Users/jreback/pandas/pandas/core/indexes/base.py in argsort(self, *args, **kwargs)
   2089         if result is None:
   2090             result = np.array(self)
-> 2091         return result.argsort(*args, **kwargs)
   2092 
   2093     def __add__(self, other):

TypeError: '>' not supported between instances of 'str' and 'int'

so to clean up some things. I would move pandas/core/algos/safe_sort to pandas/core/sorting (just to clean up a bit). Then this can be selectively used where needed (in a try/except)

0reactions
mroeschkecommented, Oct 26, 2019

This looks to be fixed on master. Could use a test.

In [128]: X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
     ...: X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
     ...: X.groupby('grouping').aggregate(lambda x: x.tolist())
Out[128]:
                                                          X  ...                                                  Z
grouping                                                     ...
2         [0.8198860820544791, 0.9156085166840109, 0.075...  ...  [0.928978831584153, 0.8276988600820108, 0.1694...
group 1   [0.2072740165365099, 0.5195836363398144, 0.038...  ...  [0.9497574283642745, 0.7137629888625677, 0.478...

[2 rows x 3 columns]

In [130]: pd.__version__
Out[130]: '0.26.0.dev0+682.g08ab156eb'
Read more comments on GitHub >

github_iconTop Results From Across the Web

7 Common GROUP BY Errors - LearnSQL.com
1. Forgetting GROUP BY with Aggregate Functions. You use SELECT statements with the GROUP BY clause when you want to group and organize...
Read more >
pd.core.groupby.data error no numeric types to aggregate
If solution failed because mixed values - numeric with strings, need to_numeric with parameter errors='coerce' for convert unparseable ...
Read more >
Aggregate Error Messages In a Calculation Editor
"Cannot mix aggregate and non-aggregate arguments with this function." - All fields must be the same aggregation (aggregated or de-aggregated).
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
Aggregation : compute a summary statistic (or statistics) for each group. ... It returns a Series whose index are the group names and...
Read more >
GROUP BY clause | Databricks on AWS
The grouping expressions and advanced aggregations can be mixed in ... an aggregate function Databricks raises a GROUP_BY_AGGREGATE error.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found