question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rolling groupby should not maintain the by column in the resulting DataFrame

See original GitHub issue

I found another oddity while digging through #13966.

Begin with the initial DataFrame in that issue:

df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.arange(40)})

Save the grouping:

In [215]: g = df.groupby('A')

Compute the rolling sum:

In [216]: r = g.rolling(4)

In [217]: r.sum()
Out[217]:
         A      B
A
1 0    NaN    NaN
  1    NaN    NaN
  2    NaN    NaN
  3    4.0    6.0
  4    4.0   10.0
  5    4.0   14.0
  6    4.0   18.0
  7    4.0   22.0
  8    4.0   26.0
  9    4.0   30.0
...    ...    ...
2 30   8.0  114.0
  31   8.0  118.0
3 32   NaN    NaN
  33   NaN    NaN
  34   NaN    NaN
  35  12.0  134.0
  36  12.0  138.0
  37  12.0  142.0
  38  12.0  146.0
  39  12.0  150.0

[40 rows x 2 columns]

It maintains the by column (A)! That column should not be in the resulting DataFrame.

It gets weirder if I compute the sum over the entire grouping and then re-do the rolling calculation. Now by column is gone as expected:

In [218]: g.sum()
Out[218]:
     B
A
1  190
2  306
3  284

In [219]: r.sum()
Out[219]:
          B
A
1 0     NaN
  1     NaN
  2     NaN
  3     6.0
  4    10.0
  5    14.0
  6    18.0
  7    22.0
  8    26.0
  9    30.0
...     ...
2 30  114.0
  31  118.0
3 32    NaN
  33    NaN
  34    NaN
  35  134.0
  36  138.0
  37  142.0
  38  146.0
  39  150.0

[40 rows x 1 columns]

So the grouping summation has some sort of side effect.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

10reactions
andreas-vestercommented, Feb 20, 2020

The problem still exists in v1.0.1

5reactions
chiachongcommented, Aug 1, 2019

Still the same problem in 0.25.

Workaround: df.groupby(‘A’).rolling(4).sum().reset_index(level=0, drop=True)

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas - Python - rolling functions for GroupBy object
curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the ...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
This means that the output column ordering would not be consistent. To ensure consistent ordering, the keys (and so output columns) will always...
Read more >
Python Pandas: Rolling functions for GroupBy object
To roll the groupby sum to work with the grouped objects, we will first groupby and sum the Dataframe and then we will...
Read more >
dask.dataframe.groupby.SeriesGroupBy.rolling
Provides rolling transformations. ... Since MultiIndexes are not well supported in Dask, this method returns a dataframe with the same index as the...
Read more >
Pandas groupby() and sum() With Examples
Use DataFrame.groupby().sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found