Series with NAMED period index raise error on groupby index.month (pandas 1.0 specific)
See original GitHub issueedit from @TomAugspurger: this is fixed on master, but the example below needs to be added as a unit test. The test can probably go in groupby/test_groupby.py
.
Description
With the pandas 1.0.1 (full version with dependencies at the end), series with NAMED period index raise error on groupby index.month
There is no error if the index is not named.
There was no error wit pandas 0.25.3
Code Sample
import pandas as pd
index = pd.period_range(start='2018-01', periods=24, freq='M')
periodSerie = pd.Series(range(24),index=index)
periodSerie.index.name = 'Month'
periodSerie.groupby(periodSerie.index.month).sum()
Error
It seems to me that pandas tries to interpret the index name as if it were part of the index itself.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4410 try:
-> 4411 return libindex.get_value_at(s, key)
4412 except IndexError:
pandas/_libs/index.pyx in pandas._libs.index.get_value_at()
pandas/_libs/index.pyx in pandas._libs.index.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
516 try:
--> 517 value = super().get_value(s, key)
518 except (KeyError, IndexError):
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4418 else:
-> 4419 raise e1
4420 except Exception:
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4404 try:
-> 4405 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4406 except KeyError as e1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Month'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()
ValueError: Unknown datetime string format, unable to parse: Month
During handling of the above exception, another exception occurred:
DateParseError Traceback (most recent call last)
<ipython-input-6-a3e948d22d88> in <module>
5 periodSerie = pd.Series(range(24),index=index)
6 periodSerie.index.name = 'Month'
----> 7 periodSerie.groupby(periodSerie.index.month).sum()
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
1676 axis = self._get_axis_number(axis)
1677
-> 1678 return groupby_generic.SeriesGroupBy(
1679 obj=self,
1680 keys=by,
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
400 from pandas.core.groupby.grouper import get_grouper
401
--> 402 grouper, exclusions, obj = get_grouper(
403 obj,
404 keys,
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
583 for i, (gpr, level) in enumerate(zip(keys, levels)):
584
--> 585 if is_in_obj(gpr): # df.groupby(df['name'])
586 in_axis, name = True, gpr.name
587 exclusions.append(name)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in is_in_obj(gpr)
577 return False
578 try:
--> 579 return gpr is obj[gpr.name]
580 except (KeyError, IndexError):
581 return False
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
869 key = com.apply_if_callable(key, self)
870 try:
--> 871 result = self.index.get_value(self, key)
872
873 if not is_scalar(result):
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
518 except (KeyError, IndexError):
519 if isinstance(key, str):
--> 520 asdt, parsed, reso = parse_time_string(key, self.freq)
521 grp = resolution.Resolution.get_freq_group(reso)
522 freqn = resolution.get_freq_group(self.freq)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
DateParseError: Unknown datetime string format, unable to parse: Month
Expected Output
With pandas 0.25.3, the following expected output is produced :
Month
1 12
2 14
3 16
4 18
5 20
6 22
7 24
8 26
9 28
10 30
11 32
12 34
dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.4.18-1-MANJARO machine : x86_64 processor : byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8
pandas : 1.0.1 numpy : 1.18.0 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 44.0.0 Cython : 0.29.15 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.11.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.2 numexpr : None odfpy : None openpyxl : 3.0.2 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (14 by maintainers)
Top GitHub Comments
Too bad, my colleague and I had a test ready since Monday. He mentioned this thread in his commit (https://github.com/Tetras-Libre/pandas/commit/74e65a8f9372b893a3d228973fdd9472de971ff3) but I forgot to make the pull request 😦
@MarcoGorelli to review, test should pass on master but fail on 2.0.x branch
This is also fixed on 1.0.x branch from backporting #34049 (i.e. 1.0.4)
b3ebcb03287ed5c0082850b2bde8c7888ce6ac9b is the first new commit commit b3ebcb03287ed5c0082850b2bde8c7888ce6ac9b Author: Simon Hawkins simonjayhawkins@gmail.com Date: Tue May 19 12:39:11 2020 +0100