question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Series with NAMED period index raise error on groupby index.month (pandas 1.0 specific)

See original GitHub issue

edit from @TomAugspurger: this is fixed on master, but the example below needs to be added as a unit test. The test can probably go in groupby/test_groupby.py.

Description

With the pandas 1.0.1 (full version with dependencies at the end), series with NAMED period index raise error on groupby index.month

There is no error if the index is not named.

There was no error wit pandas 0.25.3

Code Sample

import pandas as pd

index = pd.period_range(start='2018-01', periods=24, freq='M')
periodSerie = pd.Series(range(24),index=index)
periodSerie.index.name = 'Month'
periodSerie.groupby(periodSerie.index.month).sum()

Error

It seems to me that pandas tries to interpret the index name as if it were part of the index itself.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4410             try:
-> 4411                 return libindex.get_value_at(s, key)
   4412             except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
    516         try:
--> 517             value = super().get_value(s, key)
    518         except (KeyError, IndexError):

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4418                 else:
-> 4419                     raise e1
   4420             except Exception:

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4404         try:
-> 4405             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4406         except KeyError as e1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'Month'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()

ValueError: Unknown datetime string format, unable to parse: Month

During handling of the above exception, another exception occurred:

DateParseError                            Traceback (most recent call last)
<ipython-input-6-a3e948d22d88> in <module>
      5 periodSerie = pd.Series(range(24),index=index)
      6 periodSerie.index.name = 'Month'
----> 7 periodSerie.groupby(periodSerie.index.month).sum()

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
   1676         axis = self._get_axis_number(axis)
   1677 
-> 1678         return groupby_generic.SeriesGroupBy(
   1679             obj=self,
   1680             keys=by,

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
    400             from pandas.core.groupby.grouper import get_grouper
    401 
--> 402             grouper, exclusions, obj = get_grouper(
    403                 obj,
    404                 keys,

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
    583     for i, (gpr, level) in enumerate(zip(keys, levels)):
    584 
--> 585         if is_in_obj(gpr):  # df.groupby(df['name'])
    586             in_axis, name = True, gpr.name
    587             exclusions.append(name)

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in is_in_obj(gpr)
    577             return False
    578         try:
--> 579             return gpr is obj[gpr.name]
    580         except (KeyError, IndexError):
    581             return False

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
    518         except (KeyError, IndexError):
    519             if isinstance(key, str):
--> 520                 asdt, parsed, reso = parse_time_string(key, self.freq)
    521                 grp = resolution.Resolution.get_freq_group(reso)
    522                 freqn = resolution.get_freq_group(self.freq)

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

DateParseError: Unknown datetime string format, unable to parse: Month

Expected Output

With pandas 0.25.3, the following expected output is produced :

Month
1     12
2     14
3     16
4     18
5     20
6     22
7     24
8     26
9     28
10    30
11    32
12    34
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.4.18-1-MANJARO machine : x86_64 processor : byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 1.0.1 numpy : 1.18.0 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 44.0.0 Cython : 0.29.15 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.11.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.2 numexpr : None odfpy : None openpyxl : 3.0.2 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
daxidcommented, Mar 28, 2020

Too bad, my colleague and I had a test ready since Monday. He mentioned this thread in his commit (https://github.com/Tetras-Libre/pandas/commit/74e65a8f9372b893a3d228973fdd9472de971ff3) but I forgot to make the pull request 😦

@MarcoGorelli to review, test should pass on master but fail on 2.0.x branch

0reactions
simonjayhawkinscommented, Jun 23, 2020

This is also fixed on 1.0.x branch from backporting #34049 (i.e. 1.0.4)

b3ebcb03287ed5c0082850b2bde8c7888ce6ac9b is the first new commit commit b3ebcb03287ed5c0082850b2bde8c7888ce6ac9b Author: Simon Hawkins simonjayhawkins@gmail.com Date: Tue May 19 12:39:11 2020 +0100

Backport PR #34049 on branch 1.0.x (Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level) (#34247)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Group By: split-apply-combine — pandas 1.0.0 documentation
This is included in GroupBy as the size method. It returns a Series whose index are the group names and whose values are...
Read more >
pandas dataframe groupby datetime month - Stack Overflow
To group the months in chronological order, you need to swap the month and year index. The resulting command for the grouping being...
Read more >
Tips on Working with Datetime Index in pandas - Sergi's Blog
As you may understand from the title it is not a complete guide on Time Series or Datetime data type in Python.
Read more >
Manipulating DataFrames with Pandas - Trenton McKinney
2.2 Index values and names¶. Which one of the following index operations does not raise an error? The sales DataFrame which you have...
Read more >
Python Pandas DataFrame: Exercises, Practice, Solution
Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels. Go to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found