question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Aggregate mean over TimeDelta column

See original GitHub issue

Cannot compute mean of timedelta in gropuby

In [1]: import pandas as pd
   ...: import numpy as np
   ...: print(pd.__version__)
   ...: 
   ...: time = pd.to_timedelta(np.random.randint(100,1000, size=10), unit='s')
   ...: cat = np.random.choice(['A','B','C'], size=10)
   ...: 
   ...: df = pd.DataFrame(dict(time=time, cat=cat))
   ...: 
   ...: print(df['time'].mean())
   ...: 
   ...: df.groupby('cat')['time'].mean()
   ...: 
0.20.3
0 days 00:07:15.800000
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-1-b649ebd78333> in <module>()
     10 print(df['time'].mean())
     11 
---> 12 df.groupby('cat')['time'].mean()

/Users/adefusco/Applications/anaconda3/4.3/envs/module-pandas/lib/python3.6/site-packages/pandas/core/groupby.py in mean(self, *args, **kwargs)
   1035         nv.validate_groupby_func('mean', args, kwargs, ['numeric_only'])
   1036         try:
-> 1037             return self._cython_agg_general('mean', **kwargs)
   1038         except GroupByError:
   1039             raise

/Users/adefusco/Applications/anaconda3/4.3/envs/module-pandas/lib/python3.6/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, alt, numeric_only)
    834 
    835         if len(output) == 0:
--> 836             raise DataError('No numeric types to aggregate')
    837 
    838         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

Problem description

The mean of TimeDelta can be computed, but it is not working in GroupBy.

Expected Output

This is the output I would expect.


In [3]: def average_time(x):
   ...:     s = x.dt.seconds
   ...:     return pd.Timedelta(s.mean(), unit='s')
   ...: 
   ...: df.groupby('cat')['time'].apply(average_time)
   ...: 
Out[3]: 
cat
A   00:09:27.666667
B   00:06:18.666667
C   00:06:19.750000
Name: time, dtype: timedelta64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0 python-bits: 64 OS: Darwin
OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None
LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.0.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: None lxml: None bs4: None html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.2.1

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:5
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

7reactions
chris-b1commented, Aug 31, 2017

The cython path will handle timedeltas as below

In [47]: df.groupby('cat')['time'].mean(numeric_only=False)
Out[47]: 
cat
A   00:09:46.666666
B   00:10:02.750000
Name: time, dtype: timedelta64[ns]

Arguably not a great API - we automatically exclude “non-numeric” columns on groupby ops. This makes sense with an entire DataFrame, but perhaps on a Series selection this should default to true?

7reactions
TomAugspurgercommented, Aug 30, 2017

I think this is just because we don’t have a cython implementation of group_mean in https://github.com/pandas-dev/pandas/blob/ad7d6fc0248edaf098537e5674dcc0c9dd059491/pandas/_libs/groupby_helper.pxi.in#L31 if you’re interested in taking a look

FYI, a slightly smaller workaround is df.groupby('cat')['time'].agg(lambda x: x.mean())

Read more comments on GitHub >

github_iconTop Results From Across the Web

Finding the mean and standard deviation of a timedelta object ...
I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below....
Read more >
Aggregations on time-series data with Pandas - Zero with Dot
The simplest form of a time-series aggregation is to feed values into evenly spaced bins using an aggregating function. It helps to adjust...
Read more >
pandas.DataFrame.resample — pandas 1.5.2 documentation
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for '5min' frequency, base could range from 0...
Read more >
tecton.declarative.Aggregation
column (str) – Column name of the feature we are aggregating. · function (Union[str, AggregationFunction]) – One of the built-in aggregation functions.
Read more >
Pandas Timedelta mean returns error "No numeric types to ...
In my case, it's fine to have even a simple categorical result, so I can do something ... convert timedelta column to seconds...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found