question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Exception: Column(s) <cols> already selected" when using groupby, resample, and agg

See original GitHub issue

Code Sample, a copy-pastable example if possible

from datetime import datetime, timedelta
import pandas as pd

times = [datetime.now() + timedelta(hours=i) for i in range(10)] * 2
ids = ['foo', 'bar'] * 10
df = pd.DataFrame({'time': times, 'id': ids, 'value': range(20)})

Problem description

This line:

df.set_index('time').groupby('id').resample('H').agg(['mean', 'std'])

causes this error:

Exception: Column(s) id already selected

however invoking mean and std individually work as expected:

df.set_index('time').groupby('id').resample('H').mean()
df.set_index('time').groupby('id').resample('H').std()

Expected Output

A dataframe with the combined outputs of:

>>> df.set_index('time').groupby('id').resample('H').mean()
                         value
id  time                      
bar 2018-11-06 12:00:00    6.0
    2018-11-06 13:00:00    NaN
    2018-11-06 14:00:00    8.0
    2018-11-06 15:00:00    NaN
    2018-11-06 16:00:00   10.0
    2018-11-06 17:00:00    NaN
    2018-11-06 18:00:00   12.0
    2018-11-06 19:00:00    NaN
    2018-11-06 20:00:00   14.0
foo 2018-11-06 11:00:00    5.0
    2018-11-06 12:00:00    NaN
    2018-11-06 13:00:00    7.0
    2018-11-06 14:00:00    NaN
    2018-11-06 15:00:00    9.0
    2018-11-06 16:00:00    NaN
    2018-11-06 17:00:00   11.0
    2018-11-06 18:00:00    NaN
    2018-11-06 19:00:00   13.0

>>> df.set_index('time').groupby('id').resample('H').std()
                            value
id  time                         
bar 2018-11-06 12:00:00  7.071068
    2018-11-06 13:00:00       NaN
    2018-11-06 14:00:00  7.071068
    2018-11-06 15:00:00       NaN
    2018-11-06 16:00:00  7.071068
    2018-11-06 17:00:00       NaN
    2018-11-06 18:00:00  7.071068
    2018-11-06 19:00:00       NaN
    2018-11-06 20:00:00  7.071068
foo 2018-11-06 11:00:00  7.071068
    2018-11-06 12:00:00       NaN
    2018-11-06 13:00:00  7.071068
    2018-11-06 14:00:00       NaN
    2018-11-06 15:00:00  7.071068
    2018-11-06 16:00:00       NaN
    2018-11-06 17:00:00  7.071068
    2018-11-06 18:00:00       NaN
    2018-11-06 19:00:00  7.071068

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.6.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.3 pytest: 3.7.1 pip: 18.0 setuptools: 39.1.0 Cython: None numpy: 1.14.5 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.5 patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.12 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
trianta2commented, Jan 16, 2019

@otmezger I’m avoiding agg and invoking individual methods until this is resolved. Not the best solution unfortunately.

0reactions
otmezgercommented, Jan 15, 2019

@trianta2 I’m having the same issue with

dfr = df_sel.groupby('client_id').resample('5T').agg({
            'column_A': 'last',
            'column_B': 'last',
            'seen_time': ['min','max']
        })

have you found a workaround other than installing the release candidate for pandas 0.24?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas 0.18.1 groupby and resample with multilevel ...
This tripped me up for awhile so I figured I would share. Exception: Column(s) A already selected. http://pandas.pydata.org/pandas-docs/version/ ...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
Transformation: perform some group-specific computations and return a like-indexed ... We could naturally group by either the A or B columns, or both:....
Read more >
10 Resampling — Pandas Doc - GitHub Pages
.resample() is a time-based groupby, followed by a reduction method on each of its ... We can select a specific column or columns...
Read more >
Introduction to Financial Python - Pandas-Resampling and ...
To fetch those prices, we use the series.resample.agg() method: ... We may select certain columns of a DataFrame using their names:.
Read more >
What's New — pandas 0.23.4 documentation
Bug where calling DataFrameGroupBy.agg() with a list of functions including ohlc ... Bug in DataFrame.duplicated() with a large number of columns causing a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found