BUG: Series.groupby fails with InvalidIndexError on time series with a tuple-named grouper.
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
s = pd.Series(index=[pd.Timestamp(2021,7,26)], name=('A', 1))
s.groupby(s==s) # raises InvalidIndexError
Problem description
There are several things necessary to trigger this:
- The index has to consist of
pd.Timestamp
s. If you change the example tos = pd.Series(index=[0,1], name=('A', 1))
everything is fine. - The grouper, i.e.
groupby
’sby
argument , has to be aSeries
named with a tuple (like you get if you select aMultiIndex
-column from aDataFrame
). If you change the example tos.groupby((s==s).rename('foo'))
everything is fine. Same goes if you use a pure list or a function. - The object to group must be a
Series
. If you change the example topd.DataFrame(s).groupby(s==s)
everything is fine. - You need
pandas>=1.1.3
. If you downgrade to an older version - guess what - everything is fine.
Why this is a problem: The name of a grouper should not matter and it is just not to understand why a time series behaves differently than any other kind of Series
.
Expected Output
Behavior should be as for non-time-series.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : db08276bc116c438d3fdee492026f8223584c477 python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252 pandas : 1.1.3 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.2 fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : 2.7.3 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
I don’t tend to put older regressions on the current backport milestone as it’s more of a tracker than a project task list. If we get a PR to fix, can then assess whether we can/ want to backport. Regressions and bug fixes are allowable in a patch release.
Sure, count me in. Never contributed here before. So I have to dig into your tests first. But I’ll go for it.