Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: crash in df.groupby.rolling.mean with forward window indexer

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

Here is a replicate code:
df = pd.DataFrame({'B': [np.nan, 1, 2, np.nan, 4,5,6,7,8,9,10,11,12,13,14,15]})
df['A']=1
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=12) # forward 12 period
# First time is OK
df['mean']=df.groupby('A')['B'].rolling(window=indexer, min_periods=1).mean().reset_index().loc[:,'B']
# Second time would crash!
df['mean']=df.groupby('A')['B'].rolling(window=indexer, min_periods=1).mean().reset_index().loc[:,'B']

If don't have groupby, it won't crash.
If don't forward window indexer, it won't crash.
If in the first time, it won't crash.

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

When you use df.groupby.rolling.mean with a forward-looking period, it would crash if you run the df.groupby.rolling.mean again.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.8.11.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : zh_CN LOCALE : Chinese (Simplified)_China.936

pandas : 1.3.1 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.2 setuptools : 52.0.0.post20210125 Cython : 0.29.24 pytest : 6.2.4 hypothesis : None sphinx : 4.0.2 blosc : None feather : None xlsxwriter : 3.0.1 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.26.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : 1.4.22 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.53.1 [paste the output of pd.show_versions() here leaving a blank line after the details tag]

Issue Analytics

State:
Created 2 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

mroeschkecommented, Aug 29, 2021

Only the indexers here are public, https://pandas.pydata.org/pandas-docs/stable/reference/window.html#window-indexer, so other ones, including GroupbyIndexer, can have the spelling mistakes fixed.

Thanks for looking into a fix.

0reactions

simonjayhawkinscommented, Aug 30, 2021

Okay, I think I have a viable fix for both, and new tests have been added to verify the behaviour is sane in this and other cases.

The code sample did not segfault in pandas 1.1.5

first bad commit: [a8e2f92abd43c32e729704e0872171daa1c804f7] REF: Simplifying creation of window indexers internally (#37177)

since not all users update on every pandas release, we could perhaps consider backporting the fix for the segfault issue if the issues could be kept independent.

Top Results From Across the Web

Issues using the rolling feature of pandas with a condition

We can use a FixedForwardWindowIndexer with an offset of -3 as the window instead of shifting after the fact, and droplevel to remove...

What's New — pandas 0.20.2 documentation - PyData |

Conversion; Indexing; I/O; Plotting; Groupby/Resample/Rolling; Sparse; Reshaping ... Bug in creating a time-based rolling window on an empty DataFrame ...

Rolling Aggregations on Time Series Data with Pandas

Loading time series data from a CSV is straight forward in pandas. We simply use the read CSV command and define the Datetime...

2 Server Error Message Reference - MySQL :: Developer Zone

PDF (US Ltr) - 7.8Mb ... Check the %d value to see what the OS error means. ... Message: ASC or DESC with...

What's New — xarray 0.10.0 documentation

Indexing now supports broadcasting over dimensions, similar to NumPy's vectorized indexing (but better!). resample() has a new groupby-like API like pandas.