question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed"

See original GitHub issue

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame({'A': [1] * 5 + [0]*5})
win_size = 4
df['left'] = df['A'].rolling(win_size, closed='left').sum()
df['right'] = df['A'].rolling(win_size, closed='right').sum()
df['both'] = df['A'].rolling(win_size, closed='both').sum()
df['neither'] = df['A'].rolling(win_size, closed='neither').sum()
df
Out[6]: 
   A  left  right  both  neither
0  1   NaN    NaN   NaN      NaN
1  1   NaN    NaN   NaN      NaN
2  1   NaN    NaN   NaN      NaN
3  1   NaN    4.0   4.0      NaN
4  1   4.0    4.0   5.0      NaN
5  0   4.0    3.0   4.0      NaN
6  0   3.0    2.0   3.0      NaN
7  0   2.0    1.0   2.0      NaN
8  0   1.0    0.0   1.0      NaN
9  0   0.0    0.0   0.0      NaN

Issue Description

There seems to be some inconsistencies in the behavior of the rolling function related to the parameter ‘closed’. Different options were tested and assigned to a different column in the toy example above. First, when using ‘neither’ it returns NaNs as per the ‘neither’ column in the output. Second, when we use ‘right’ or ‘left’, the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed=‘both’, we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.

Expected Behavior

1- Using ‘neither’, should yield the same result as ‘left’ minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to ‘left’. 2- Using ‘left’ or ‘right’ should exclude from the calculation one of the endpoints and never take the whole window for calculation. 3- Using ‘both’, should use a window, exactly the same size as the parameter ‘window’, inclusive of the current row.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Canada.1252 pandas : 1.3.4 numpy : 1.19.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.3.1 setuptools : 58.2.0 Cython : 0.29.24 pytest : 6.2.5 hypothesis : 6.23.2 sphinx : 4.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.29.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 4.0.1 pyxlsb : None s3fs : None scipy : 1.7.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.54.1

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Iqigaicommented, Dec 2, 2021

Description and title updated.

0reactions
dicristinacommented, Jul 11, 2022

This is not a bug but the documentation could be clearer, particularly the documentation for Series.rolling and DataFrame.rolling. The user guide states:

The windows are comprised by looking back the length of the window from the current observation.

I like to think that a fixed window always consists of the current row plus window rows before the current row and that the first or the last (current) row might be excluded according to the closed parameter. If closed is set to 'both' then no rows will be excluded which means that window+1 rows will be used for the result, window rows are used for 'left' and 'right', and window-1 rows are used when closed is set to 'neither'.

The 'neither' column in the example resulted in all NaN values because the min_periods parameter is automatically set to window for fixed windows. To avoid this problem min_periods has to be set to a value lower than window whenever closed='neither'.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DOC: Clarify pandas.DataFrame.rolling() when using different ...
Flexible and powerful data analysis / manipulation library for Python, ... DataFrame.rolling() when using different values for the parameter "closed" ...
Read more >
pandas.DataFrame.rolling — pandas 1.5.2 documentation
If 0 or 'index' , roll across the rows. If 1 or 'columns' , roll across the columns. For Series this parameter is...
Read more >
Pandas rolling apply using multiple columns - Stack Overflow
It uses the rolling logic to get subsets from an arbitrary column. The raw=False option provides you with index values for those subsets...
Read more >
pandas rolling() Mean, Average, Sum Examples
pandas.DataFrame.rolling() function can be used to get the rolling mean, average, sum, median, max, min e.t.c for one or multiple columns.
Read more >
Python | Pandas dataframe.rolling() - GeeksforGeeks
Pandas dataframe.rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found