question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: not possible to 'cumsum' Timedelta with named aggregation

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import random

# Dataset
ts = pd.DatetimeIndex([pd.Timestamp('2021/01/01 00:30'),
                       pd.Timestamp('2021/01/01 00:45'),
                       pd.Timestamp('2021/01/01 02:00'),
                       pd.Timestamp('2021/01/01 03:50'),
                       pd.Timestamp('2021/01/01 05:00')])
length = len(ts)
random.seed(1)
value = random.sample(range(1, length+1), length)
df = pd.DataFrame({'value': value, 'ts': ts})
df['td'] = df['ts'] - df['ts'].shift(1, fill_value=ts[0]-pd.Timedelta('1h'))
df['amount'] = df['value']*10
df.loc[:2,'grps'] = 'a'
df.loc[2:,'grps'] = 'b'
In [16]: df
Out[16]: 
   value                  ts              td  amount grps
0      2 2021-01-01 00:30:00 0 days 01:00:00      20    a
1      1 2021-01-01 00:45:00 0 days 00:15:00      10    a
2      5 2021-01-01 02:00:00 0 days 01:15:00      50    b
3      4 2021-01-01 03:50:00 0 days 01:50:00      40    b
4      3 2021-01-01 05:00:00 0 days 01:10:00      30    b
# cumsum and Timedelta work when used through pd.Series
df['td'].cumsum()

# cumsum and Timedelta do not work when used in groupby with named aggregation
agg_rules = {'td':('td', 'cumsum')}
res = df.groupby('grps').agg(**agg_rules)

Error message that is returned

res = df.groupby('grps').agg(**agg_rules)
Traceback (most recent call last):

  File "<ipython-input-14-c0f4606d675a>", line 2, in <module>
    res = df.groupby('grps').agg(**agg_rules)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 945, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 582, in aggregate
    return agg_dict_like(obj, arg, _axis), True

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in agg_dict_like
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in <dictcomp>
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 247, in aggregate
    ret = self._aggregate_multiple_funcs(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 315, in _aggregate_multiple_funcs
    results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 241, in aggregate
    return getattr(self, func)(*args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 2497, in cumsum
    return self._cython_transform("cumsum", **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 996, in _cython_transform
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

Problem description

Cumsum works with Timedelta in some workflows, but for some other workflows (groupby, named aggregation) it does not.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.8.0-53-generic Version : #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 1.2.4 numpy : 1.20.1 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.1 setuptools : 52.0.0.post20210125 Cython : 0.29.23 pytest : 6.2.3 hypothesis : None sphinx : 4.0.1 blosc : None feather : None xlsxwriter : 1.3.8 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.0 IPython : 7.22.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 0.9.0 fastparquet : 0.6.3 gcsfs : None matplotlib : 3.3.4 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 2.0.0 pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : 1.4.15 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.51.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, Dec 3, 2022

No they aren’t the same - the first is just a normal cumulative whereas the latter is a groupby. So should respect the groupings

0reactions
seanjedicommented, Dec 3, 2022

@WillAyd Added a new test here: https://github.com/pandas-dev/pandas/pull/50033 Im sure there will be a lot of things I can improve, as this is my first test with Pandas 😅

Read more comments on GitHub >

github_iconTop Results From Across the Web

No numeric types using mean aggregate function but not sum ...
This is due to the way GroupBy objects handle the different aggregation methods. In fact sum and mean are handled differently (see below...
Read more >
What's new in 1.4.0 (January 22, 2022) - Pandas
The keyword arguments level and names have been added to Styler.hide() ... Bug in Resampler.aggregate() did not allow the use of Named Aggregation...
Read more >
numpy.cumsum — NumPy v1.24 Manual
Arithmetic is modular when using integer types, and no error is raised on overflow. cumsum(a)[-1] may not be equal to sum(a) for floating-point...
Read more >
Pandas Groupby: a simple but detailed tutorial | by Shiu-Tang Li
A DataFrame object can be visualized easily, but not for a Pandas ... C. Use named aggregation (new in Pandas 0.25.0) as the...
Read more >
The difference between the expanding and rolling window in ...
How to use rolling window with datetime (and other types) in Pandas. ... Now it is possible to calculate the aggregate function.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found