question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Pandas groupby datetime and column then apply generates ValueError

See original GitHub issue

Code Sample, a copy-pastable example if possible

    | SYM_ROOT | TIME_M                     | BEST_BID | BEST_ASK | increment | genjud_incre | 
    |----------|----------------------------|----------|----------|-----------|--------------| 
    | A        | 2017-01-03 09:30:00.004712 | 45.91    | 46.12    | 0         | 4680         | 
    | AA       | 2017-01-03 09:30:00.004014 | 28.55    | 28.57    | 0         | 4680         | 
# Your code here
df=pd.read_csv('stack.csv')
df['TIME_M']=pd.to_datetime(df['TIME_M'],format='%Y%m%d %H:%M:%S.%f')
df.groupby(['SYM_ROOT',df['TIME_M'].dt.date]).apply(group_increment_to_end)

def group_increment_to_end(x):
    return x.iloc[0:1]

Problem description

I am trying to group my dataframe and then apply a function to each row of the dataframe. SYM_ROOT is a category variable, while TIME_M is a datetime variable.

However, I keep getting the following error:

ValueError: Key 2017-01-03 00:00:00 not in level Index([2017-01-03], dtype='object', name=u'TIME_M')

I am referring to this stackoverflow post.

I think the reason is that pandas somehow doesn’t recognize the date object in the column correctly when using the group by statement. It falsely think compares 2017-01-03 00:00:00 with TIME_M and generates the error. After separating out the date as a single column, the problem fixes itself.

But I think it will be beneficial if pandas can recognize the date object correctly in the columns …

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: None.None

pandas: 0.22.0 pytest: 3.3.2 pip: 9.0.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 5.4.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
hwalingacommented, Sep 23, 2020

@maxzinkus Yeah, this only started working in 1.1.0.

1reaction
mroeschkecommented, Jun 27, 2018

Simpler example:

In [15]: df = pd.DataFrame({'date': pd.date_range('2010-01-01', freq='12H', periods=5), 'vals': range(5), 'let': list('abcde')})

In [16]: df
Out[16]:
                 date  vals let
0 2010-01-01 00:00:00     0   a
1 2010-01-01 12:00:00     1   b
2 2010-01-02 00:00:00     2   c
3 2010-01-02 12:00:00     3   d
4 2010-01-03 00:00:00     4   e

In [17]: df.groupby([df.let, df.date.dt.date]).apply(lambda x: x.iloc[0:])

In [18]: pd.__version__
Out[18]: '0.24.0.dev0+177.g45e55af'

From the traceback, looks like it’s an issue in indexing a (Multi?)Index with dates with a Timestamp

Investigation and PRs welcome!

Traceback

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2968             try:
-> 2969                 return self._engine.get_loc(key)
   2970             except KeyError:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    139
--> 140     cpdef get_loc(self, object val):
    141         if is_definitely_invalid_key(val):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    161         try:
--> 162             return self.mapping.get_item(val)
    163         except (TypeError, ValueError):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1489
-> 1490     cpdef get_item(self, object val):
   1491         cdef khiter_t k

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1497         else:
-> 1498             raise KeyError(val)
   1499

KeyError: Timestamp('2010-01-01 00:00:00')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    563                 try:
--> 564                     i = level.get_loc(key)
    565                 except KeyError:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2970             except KeyError:
-> 2971                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2972

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    139
--> 140     cpdef get_loc(self, object val):
    141         if is_definitely_invalid_key(val):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    161         try:
--> 162             return self.mapping.get_item(val)
    163         except (TypeError, ValueError):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1489
-> 1490     cpdef get_item(self, object val):
   1491         cdef khiter_t k

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1497         else:
-> 1498             raise KeyError(val)
   1499

KeyError: Timestamp('2010-01-01 00:00:00')

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    917             try:
--> 918                 result = self._python_apply_general(f)
    919             except Exception:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    940             values,
--> 941             not_indexed_same=mutated or self.mutated)
    942

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   4220             return self._concat_objects(keys, values,
-> 4221                                         not_indexed_same=not_indexed_same)
   4222         elif self.grouper.groupings is not None:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
   1135                                 levels=group_levels, names=group_names,
-> 1136                                 sort=False)
   1137             else:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    224                        verify_integrity=verify_integrity,
--> 225                        copy=copy, sort=sort)
    226     return op.get_result()

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    377
--> 378         self.new_axes = self._get_new_axes()
    379

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _get_new_axes(self)
    457
--> 458         new_axes[self.axis] = self._get_concat_axis()
    459         return new_axes

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _get_concat_axis(self)
    513             concat_axis = _make_concat_multiindex(indexes, self.keys,
--> 514                                                   self.levels, self.names)
    515

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    566                     raise ValueError('Key {key!s} not in level {level!s}'
--> 567                                      .format(key=key, level=level))
    568

ValueError: Key 2010-01-01 00:00:00 not in level Index([2010-01-01, 2010-01-02, 2010-01-03], dtype='object', name='date')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2968             try:
-> 2969                 return self._engine.get_loc(key)
   2970             except KeyError:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    139
--> 140     cpdef get_loc(self, object val):
    141         if is_definitely_invalid_key(val):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    161         try:
--> 162             return self.mapping.get_item(val)
    163         except (TypeError, ValueError):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1489
-> 1490     cpdef get_item(self, object val):
   1491         cdef khiter_t k

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1497         else:
-> 1498             raise KeyError(val)
   1499

KeyError: Timestamp('2010-01-01 00:00:00')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    563                 try:
--> 564                     i = level.get_loc(key)
    565                 except KeyError:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2970             except KeyError:
-> 2971                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2972

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    139
--> 140     cpdef get_loc(self, object val):
    141         if is_definitely_invalid_key(val):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    161         try:
--> 162             return self.mapping.get_item(val)
    163         except (TypeError, ValueError):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1489
-> 1490     cpdef get_item(self, object val):
   1491         cdef khiter_t k

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
   1497         else:
-> 1498             raise KeyError(val)
   1499

KeyError: Timestamp('2010-01-01 00:00:00')

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-17-e66fb9399357> in <module>()
----> 1 df.groupby([df.let, df.date.dt.date]).apply(lambda x: x.iloc[0:])

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    928
    929                 with _group_selection_context(self):
--> 930                     return self._python_apply_general(f)
    931
    932         return result

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _python_apply_general(self, f)
    939             keys,
    940             values,
--> 941             not_indexed_same=mutated or self.mutated)
    942
    943     def _iterate_slices(self):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   4219         elif isinstance(v, DataFrame):
   4220             return self._concat_objects(keys, values,
-> 4221                                         not_indexed_same=not_indexed_same)
   4222         elif self.grouper.groupings is not None:
   4223             if len(self.grouper.groupings) > 1:

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/groupby/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
   1134                 result = concat(values, axis=self.axis, keys=group_keys,
   1135                                 levels=group_levels, names=group_names,
-> 1136                                 sort=False)
   1137             else:
   1138

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    223                        keys=keys, levels=levels, names=names,
    224                        verify_integrity=verify_integrity,
--> 225                        copy=copy, sort=sort)
    226     return op.get_result()
    227

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    376         self.copy = copy
    377
--> 378         self.new_axes = self._get_new_axes()
    379
    380     def get_result(self):

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _get_new_axes(self)
    456                 new_axes[i] = ax
    457
--> 458         new_axes[self.axis] = self._get_concat_axis()
    459         return new_axes
    460

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _get_concat_axis(self)
    512         else:
    513             concat_axis = _make_concat_multiindex(indexes, self.keys,
--> 514                                                   self.levels, self.names)
    515
    516         self._maybe_check_integrity(concat_axis)

/mnt/c/Users/Matt Roeschke/Projects/pandas-mroeschke/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    565                 except KeyError:
    566                     raise ValueError('Key {key!s} not in level {level!s}'
--> 567                                      .format(key=key, level=level))
    568
    569                 to_concat.append(np.repeat(i, len(index)))

ValueError: Key 2010-01-01 00:00:00 not in level Index([2010-01-01, 2010-01-02, 2010-01-03], dtype='object', name='date')
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas groupby datetime and column then apply generates ...
SYM_ROOT is a category variable, while TIME_M is a datetime variable. However, I keep getting the following error: ValueError: Key 2017-01-03 00 ...
Read more >
[Code]-Pandas groupby apply anomaly with datetime-pandas
First it seems bug. In my opinion here is possible create new column for each group and return not Series, but gdp group:...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
A string passed to groupby may refer to either a column or an index level. If a string matches both a column name...
Read more >
Time series / date functionality — pandas 1.5.2 documentation
Constructing a Timestamp or DatetimeIndex with an epoch timestamp with the tz argument specified will raise a ValueError. If you have epochs in...
Read more >
What's New — pandas 0.23.3 documentation - PyData |
Bug in Timedelta where non-zero timedeltas shorter than 1 microsecond were ... in groupby; Rolling/Expanding.apply() accepts raw=False to pass a Series to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found