ValueError when grouping with max/min as aggregation functions (pandas-1.0.1)
See original GitHub issueCode Sample
import pandas as pd # 1.0.1
import numpy as np # 1.18.1
# Simple test case that fails
df_simple = pd.DataFrame({'key': [1,1,2,2,3,3], 'data' : [10,20,30,40,50,60],
'good_string' : ['cat','dog','cat','dog','fish','pig'],
'bad_string' : ['cat',np.nan,np.nan, np.nan, np.nan, np.nan]})
df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()
>>>line 17, in <module>
df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1378, in f
return self._cython_agg_general(alias, alt=npfunc, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1004, in _cython_agg_general
how, alt=alt, numeric_only=numeric_only, min_count=min_count
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1099, in _cython_agg_blocks
agg_block: Block = block.make_block(result)
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 273, in make_block
return make_block(values, placement=placement, ndim=self.ndim)
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3041, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2589, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
f"Wrong number of items passed {len(self.values)}, "
ValueError: Wrong number of items passed 1, placement implies 2
-------------
# Add one more legitimate string value to the 'bad_string' column and it works
df_simple = pd.DataFrame({'key': [1,1,2,2,3,3], 'data' : [10,20,30,40,50,60],
'good_string' : ['cat','dog','cat','dog','fish','pig'],
'bad_string' : ['cat','dog',np.nan, np.nan, np.nan, np.nan]})
df_simple_max = df_simple.groupby(['key'], as_index=False, sort=False).max()
Problem description
Unless I’ve misunderstood something fundamental about the max, and min, aggregate functions, I don’t think they should error out when a Series in the DataFrame is of type object and contains all but one NaN
value. Notice from the example above that just adding one more non-NaN
value to the offending Series gets around the ValueError
.
Expected Output
No ValueError
; groupby object returned.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.2.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.6.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.43.1
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Group By: split-apply-combine — pandas 1.0.3 documentation
Aggregation : compute a summary statistic (or statistics) for each group. ... both a column name and an index level name, a ValueError...
Read more >Pandas Dataframe groupby aggregate functions and ...
This is because by adding in another aggregator, you're asking pandas to find the min and max twice for each group. Once for...
Read more >Source code for pyspark.pandas.groupby - Apache Spark
A wrapper for GroupedData to behave similar to pandas GroupBy. ... It can also be used when applying multiple aggregation functions to specific...
Read more >GroupBy One Column and Get Mean, Min, and Max values
We can use Groupby function to split dataframe into groups and apply different operations on it. One of them is Aggregation.
Read more >Data Analysis with Python
In NumPy, you could choose to operate on the entire array, or a particular axis with the keyword argument axis=n . NumPy's Aggregate...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am still getting this in 1.2.4
Traceback for the same exampole
Actually given the new deprecation (non numeric columns need to be de-selected before calling
max
), this behavior will be removed in 2.0 and tests for aggregating on only numeric columns has good test coverage. Closing as this behavior is deprecated.