question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: crash in df.groupby.rolling.mean with forward window indexer

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

Here is a replicate code:
df = pd.DataFrame({'B': [np.nan, 1, 2, np.nan, 4,5,6,7,8,9,10,11,12,13,14,15]})
df['A']=1
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=12) # forward 12 period
# First time is OK
df['mean']=df.groupby('A')['B'].rolling(window=indexer, min_periods=1).mean().reset_index().loc[:,'B']
# Second time would crash!
df['mean']=df.groupby('A')['B'].rolling(window=indexer, min_periods=1).mean().reset_index().loc[:,'B']

If don't have groupby, it won't crash.
If don't forward window indexer, it won't crash.
If in the first time, it won't crash.


Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

When you use df.groupby.rolling.mean with a forward-looking period, it would crash if you run the df.groupby.rolling.mean again.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.8.11.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : zh_CN LOCALE : Chinese (Simplified)_China.936

pandas : 1.3.1 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.2 setuptools : 52.0.0.post20210125 Cython : 0.29.24 pytest : 6.2.4 hypothesis : None sphinx : 4.0.2 blosc : None feather : None xlsxwriter : 3.0.1 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.26.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.2 sqlalchemy : 1.4.22 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.53.1 [paste the output of pd.show_versions() here leaving a blank line after the details tag]

image

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Aug 29, 2021

Only the indexers here are public, https://pandas.pydata.org/pandas-docs/stable/reference/window.html#window-indexer, so other ones, including GroupbyIndexer, can have the spelling mistakes fixed.

Thanks for looking into a fix.

0reactions
simonjayhawkinscommented, Aug 30, 2021

Okay, I think I have a viable fix for both, and new tests have been added to verify the behaviour is sane in this and other cases.

The code sample did not segfault in pandas 1.1.5

first bad commit: [a8e2f92abd43c32e729704e0872171daa1c804f7] REF: Simplifying creation of window indexers internally (#37177)

since not all users update on every pandas release, we could perhaps consider backporting the fix for the segfault issue if the issues could be kept independent.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues using the rolling feature of pandas with a condition
We can use a FixedForwardWindowIndexer with an offset of -3 as the window instead of shifting after the fact, and droplevel to remove...
Read more >
What's New — pandas 0.20.2 documentation - PyData |
Conversion; Indexing; I/O; Plotting; Groupby/Resample/Rolling; Sparse; Reshaping ... Bug in creating a time-based rolling window on an empty DataFrame ...
Read more >
Rolling Aggregations on Time Series Data with Pandas
Loading time series data from a CSV is straight forward in pandas. We simply use the read CSV command and define the Datetime...
Read more >
2 Server Error Message Reference - MySQL :: Developer Zone
PDF (US Ltr) - 7.8Mb ... Check the %d value to see what the OS error means. ... Message: ASC or DESC with...
Read more >
What's New — xarray 0.10.0 documentation
Indexing now supports broadcasting over dimensions, similar to NumPy's vectorized indexing (but better!). resample() has a new groupby-like API like pandas.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found