"Exception: Column(s) <cols> already selected" when using groupby, resample, and agg
See original GitHub issueCode Sample, a copy-pastable example if possible
from datetime import datetime, timedelta
import pandas as pd
times = [datetime.now() + timedelta(hours=i) for i in range(10)] * 2
ids = ['foo', 'bar'] * 10
df = pd.DataFrame({'time': times, 'id': ids, 'value': range(20)})
Problem description
This line:
df.set_index('time').groupby('id').resample('H').agg(['mean', 'std'])
causes this error:
Exception: Column(s) id already selected
however invoking mean
and std
individually work as expected:
df.set_index('time').groupby('id').resample('H').mean()
df.set_index('time').groupby('id').resample('H').std()
Expected Output
A dataframe with the combined outputs of:
>>> df.set_index('time').groupby('id').resample('H').mean()
value
id time
bar 2018-11-06 12:00:00 6.0
2018-11-06 13:00:00 NaN
2018-11-06 14:00:00 8.0
2018-11-06 15:00:00 NaN
2018-11-06 16:00:00 10.0
2018-11-06 17:00:00 NaN
2018-11-06 18:00:00 12.0
2018-11-06 19:00:00 NaN
2018-11-06 20:00:00 14.0
foo 2018-11-06 11:00:00 5.0
2018-11-06 12:00:00 NaN
2018-11-06 13:00:00 7.0
2018-11-06 14:00:00 NaN
2018-11-06 15:00:00 9.0
2018-11-06 16:00:00 NaN
2018-11-06 17:00:00 11.0
2018-11-06 18:00:00 NaN
2018-11-06 19:00:00 13.0
>>> df.set_index('time').groupby('id').resample('H').std()
value
id time
bar 2018-11-06 12:00:00 7.071068
2018-11-06 13:00:00 NaN
2018-11-06 14:00:00 7.071068
2018-11-06 15:00:00 NaN
2018-11-06 16:00:00 7.071068
2018-11-06 17:00:00 NaN
2018-11-06 18:00:00 7.071068
2018-11-06 19:00:00 NaN
2018-11-06 20:00:00 7.071068
foo 2018-11-06 11:00:00 7.071068
2018-11-06 12:00:00 NaN
2018-11-06 13:00:00 7.071068
2018-11-06 14:00:00 NaN
2018-11-06 15:00:00 7.071068
2018-11-06 16:00:00 NaN
2018-11-06 17:00:00 7.071068
2018-11-06 18:00:00 NaN
2018-11-06 19:00:00 7.071068
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.6.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.3 pytest: 3.7.1 pip: 18.0 setuptools: 39.1.0 Cython: None numpy: 1.14.5 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.5 patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.12 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
@otmezger I’m avoiding
agg
and invoking individual methods until this is resolved. Not the best solution unfortunately.@trianta2 I’m having the same issue with
have you found a workaround other than installing the release candidate for pandas 0.24?