Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError when grouping with max/min as aggregation functions (pandas-1.0.1)

See original GitHub issue

Code Sample

import pandas as pd # 1.0.1
import numpy as np # 1.18.1

# Simple test case that fails
df_simple = pd.DataFrame({'key': [1,1,2,2,3,3], 'data' : [10,20,30,40,50,60],
                          'good_string' : ['cat','dog','cat','dog','fish','pig'],
                          'bad_string' : ['cat',np.nan,np.nan, np.nan, np.nan, np.nan]})

df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()

>>>line 17, in <module>
    df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1378, in f
    return self._cython_agg_general(alias, alt=npfunc, **kwargs)

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1004, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1099, in _cython_agg_blocks
    agg_block: Block = block.make_block(result)

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 273, in make_block
    return make_block(values, placement=placement, ndim=self.ndim)

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3041, in make_block
    return klass(values, ndim=ndim, placement=placement)

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2589, in __init__
    super().__init__(values, ndim=ndim, placement=placement)

  File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
    f"Wrong number of items passed {len(self.values)}, "

ValueError: Wrong number of items passed 1, placement implies 2
-------------

# Add one more legitimate string value to the 'bad_string' column and it works
df_simple = pd.DataFrame({'key': [1,1,2,2,3,3], 'data' : [10,20,30,40,50,60],
                          'good_string' : ['cat','dog','cat','dog','fish','pig'],
                          'bad_string' : ['cat','dog',np.nan, np.nan, np.nan, np.nan]})

df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()

Problem description

Unless I’ve misunderstood something fundamental about the max, and min, aggregate functions, I don’t think they should error out when a Series in the DataFrame is of type object and contains all but one NaN value. Notice from the example above that just adding one more non-NaN value to the offending Series gets around the ValueError.

Expected Output

No ValueError; groupby object returned.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.2.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.1
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.6.1
pip              : 20.0.2
setuptools       : 45.2.0.post20200210
Cython           : 0.29.15
pytest           : 5.3.5
hypothesis       : 5.4.1
sphinx           : 2.4.0
blosc            : None
feather          : None
xlsxwriter       : 1.2.7
lxml.etree       : 4.5.0
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.12.0
pandas_datareader: None
bs4              : 4.8.2
bottleneck       : 1.3.1
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.3.5
pyxlsb           : None
s3fs             : 0.4.0
scipy            : 1.4.1
sqlalchemy       : 1.3.13
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.2.7
numba            : 0.43.1

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

dz0commented, Jun 2, 2021

I am still getting this in 1.2.4

$ pip show pandas
Name: pandas
Version: 1.2.4

Traceback for the same exampole

  File "/home/jurgis/.config/JetBrains/PyCharm2020.3/scratches/scratch_76.py", line 9, in <module>
    df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1676, in max
    return self._agg_general(
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1024, in _agg_general
    result = self._cython_agg_general(
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 1015, in _cython_agg_general
    agg_mgr = self._cython_agg_blocks(
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 1118, in _cython_agg_blocks
    new_mgr = data.apply(blk_func, ignore_failures=True)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 425, in apply
    applied = b.apply(f, **kwargs)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 380, in apply
    return self._split_op_result(result)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 416, in _split_op_result
    result = self.make_block(result)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 286, in make_block
    return make_block(values, placement=placement, ndim=self.ndim)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2742, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/home/jurgis/miniconda3/envs/debitum-portfolio/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 142, in __init__
    raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2

0reactions

mroeschkecommented, Dec 28, 2021

Actually given the new deprecation (non numeric columns need to be de-selected before calling max), this behavior will be removed in 2.0 and tests for aggregating on only numeric columns has good test coverage. Closing as this behavior is deprecated.