Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: not possible to 'cumsum' Timedelta with named aggregation

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import random

# Dataset
ts = pd.DatetimeIndex([pd.Timestamp('2021/01/01 00:30'),
                       pd.Timestamp('2021/01/01 00:45'),
                       pd.Timestamp('2021/01/01 02:00'),
                       pd.Timestamp('2021/01/01 03:50'),
                       pd.Timestamp('2021/01/01 05:00')])
length = len(ts)
random.seed(1)
value = random.sample(range(1, length+1), length)
df = pd.DataFrame({'value': value, 'ts': ts})
df['td'] = df['ts'] - df['ts'].shift(1, fill_value=ts[0]-pd.Timedelta('1h'))
df['amount'] = df['value']*10
df.loc[:2,'grps'] = 'a'
df.loc[2:,'grps'] = 'b'

In [16]: df
Out[16]: 
   value                  ts              td  amount grps
0      2 2021-01-01 00:30:00 0 days 01:00:00      20    a
1      1 2021-01-01 00:45:00 0 days 00:15:00      10    a
2      5 2021-01-01 02:00:00 0 days 01:15:00      50    b
3      4 2021-01-01 03:50:00 0 days 01:50:00      40    b
4      3 2021-01-01 05:00:00 0 days 01:10:00      30    b

# cumsum and Timedelta work when used through pd.Series
df['td'].cumsum()

# cumsum and Timedelta do not work when used in groupby with named aggregation
agg_rules = {'td':('td', 'cumsum')}
res = df.groupby('grps').agg(**agg_rules)

Error message that is returned

res = df.groupby('grps').agg(**agg_rules)
Traceback (most recent call last):

  File "<ipython-input-14-c0f4606d675a>", line 2, in <module>
    res = df.groupby('grps').agg(**agg_rules)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 945, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 582, in aggregate
    return agg_dict_like(obj, arg, _axis), True

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in agg_dict_like
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in <dictcomp>
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 247, in aggregate
    ret = self._aggregate_multiple_funcs(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 315, in _aggregate_multiple_funcs
    results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 241, in aggregate
    return getattr(self, func)(*args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 2497, in cumsum
    return self._cython_transform("cumsum", **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 996, in _cython_transform
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

Problem description

Cumsum works with Timedelta in some workflows, but for some other workflows (groupby, named aggregation) it does not.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.8.0-53-generic Version : #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 1.2.4 numpy : 1.20.1 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.1 setuptools : 52.0.0.post20210125 Cython : 0.29.23 pytest : 6.2.3 hypothesis : None sphinx : 4.0.1 blosc : None feather : None xlsxwriter : 1.3.8 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.0 IPython : 7.22.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 0.9.0 fastparquet : 0.6.3 gcsfs : None matplotlib : 3.3.4 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 2.0.0 pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : 1.4.15 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.51.2

Issue Analytics

State:
Created 2 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, Dec 3, 2022

No they aren’t the same - the first is just a normal cumulative whereas the latter is a groupby. So should respect the groupings

0reactions

seanjedicommented, Dec 3, 2022

@WillAyd Added a new test here: https://github.com/pandas-dev/pandas/pull/50033 Im sure there will be a lot of things I can improve, as this is my first test with Pandas 😅