Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Aggregate fails with mixed types in grouping series

See original GitHub issue

Code Sample, a copy-pastable example if possible

X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
X.groupby('grouping').aggregate(lambda x: x.tolist())

This is the exception and traceback that the code above returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   3482                     result = self._aggregate_multiple_funcs(
-> 3483                         [arg], _level=_level, _axis=self.axis)
   3484                     result.columns = Index(

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
    690         if not len(results):
--> 691             raise ValueError("no results")
    692 

ValueError: no results

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
   3508                 for name, data in self:
-> 3509                     result[name] = self._try_cast(func(data, *args, **kwargs),
   3510                                                   data)

<ipython-input-25-18b24604e98f> in <lambda>(x)
      2 X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
----> 3 X.groupby('grouping').aggregate(lambda x: x.tolist())

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name)
   3080                 return self[name]
-> 3081             return object.__getattribute__(self, name)
   3082 

AttributeError: 'DataFrame' object has no attribute 'tolist'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-25-18b24604e98f> in <module>()
      1 X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
      2 X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
----> 3 X.groupby('grouping').aggregate(lambda x: x.tolist())

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   4034         versionadded=''))
   4035     def aggregate(self, arg, *args, **kwargs):
-> 4036         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4037 
   4038     agg = aggregate

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
   3486                         name=self._selected_obj.columns.name)
   3487                 except:
-> 3488                     result = self._aggregate_generic(arg, *args, **kwargs)
   3489 
   3490         if not self.as_index:

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
   3510                                                   data)
   3511             except Exception:
-> 3512                 return self._aggregate_item_by_item(func, *args, **kwargs)
   3513         else:
   3514             for name in self.indices:

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_item_by_item(self, func, *args, **kwargs)
   3554             # GH6337
   3555             if not len(result_columns) and errors is not None:
-> 3556                 raise errors
   3557 
   3558         return DataFrame(result, columns=result_columns)

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in _aggregate_item_by_item(self, func, *args, **kwargs)
   3539                                      grouper=self.grouper)
   3540                 result[item] = self._try_cast(
-> 3541                     colg.aggregate(func, *args, **kwargs), data)
   3542             except ValueError:
   3543                 cannot_agg.append(item)

/Users/nicolaireeve/miniconda2/envs/skbiodev/lib/python3.4/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2885                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
   2886 
-> 2887             index = Index(sorted(result), name=self.grouper.names[0])
   2888             ret = Series(result, index=index)
   2889 

TypeError: unorderable types: str() < int()

Problem description

If a grouping vector is of mixed type and aggregate is used after groupby(…), an exception will be raised. The source code will get to this line and fails because sorted() does not support mixed types.

Expected Output

This is what we would expect to see if the exception was not raised. This output was achieved by using a column in groupby that is of a single type. In this instance, 2 was changed to a string

X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
X['grouping'] = ['group 1', 'group 1', 'group 1', '2', '2' , '2', 'group 1']
X.groupby('grouping').aggregate(lambda x: x.tolist())

                                                          X  \
grouping                                                      
2         [0.9219120799240533, 0.6439069401684864, 0.035...   
group 1   [0.6884732212797477, 0.326906484996646, 0.6718...   

                                                          Y  \
grouping                                                      
2         [0.7796923828539405, 0.7668459596180287, 0.868...   
group 1   [0.20259205506065203, 0.9138593138141587, 0.95...   

                                                          Z  
grouping                                                     
2         [0.9863526134877422, 0.6342347501171951, 0.873...  
group 1   [0.054465751087565906, 0.9026560581041934, 0.9...

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

cc @ElDeveloper

Issue Analytics

State:
Created 6 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

jrebackcommented, Jul 14, 2017

simpler example

In [20]: Index([0, '1']).sort_values()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-c2949e5a9d7f> in <module>()
----> 1 Index([0, '1']).sort_values()

/Users/jreback/pandas/pandas/core/indexes/base.py in sort_values(self, return_indexer, ascending)
   2026         Return sorted copy of Index
   2027         """
-> 2028         _as = self.argsort()
   2029         if not ascending:
   2030             _as = _as[::-1]

/Users/jreback/pandas/pandas/core/indexes/base.py in argsort(self, *args, **kwargs)
   2089         if result is None:
   2090             result = np.array(self)
-> 2091         return result.argsort(*args, **kwargs)
   2092 
   2093     def __add__(self, other):

TypeError: '>' not supported between instances of 'str' and 'int'

so to clean up some things. I would move pandas/core/algos/safe_sort to pandas/core/sorting (just to clean up a bit). Then this can be selectively used where needed (in a try/except)

0reactions

mroeschkecommented, Oct 26, 2019

This looks to be fixed on master. Could use a test.

In [128]: X = pd.DataFrame(data=np.random.rand(7, 3), columns=list('XYZ'), index=list('zxcvbnm'))
     ...: X['grouping'] = ['group 1', 'group 1', 'group 1', 2, 2 , 2, 'group 1']
     ...: X.groupby('grouping').aggregate(lambda x: x.tolist())
Out[128]:
                                                          X  ...                                                  Z
grouping                                                     ...
2         [0.8198860820544791, 0.9156085166840109, 0.075...  ...  [0.928978831584153, 0.8276988600820108, 0.1694...
group 1   [0.2072740165365099, 0.5195836363398144, 0.038...  ...  [0.9497574283642745, 0.7137629888625677, 0.478...

[2 rows x 3 columns]

In [130]: pd.__version__
Out[130]: '0.26.0.dev0+682.g08ab156eb'