DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed"
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
df = pd.DataFrame({'A': [1] * 5 + [0]*5})
win_size = 4
df['left'] = df['A'].rolling(win_size, closed='left').sum()
df['right'] = df['A'].rolling(win_size, closed='right').sum()
df['both'] = df['A'].rolling(win_size, closed='both').sum()
df['neither'] = df['A'].rolling(win_size, closed='neither').sum()
df
Out[6]:
A left right both neither
0 1 NaN NaN NaN NaN
1 1 NaN NaN NaN NaN
2 1 NaN NaN NaN NaN
3 1 NaN 4.0 4.0 NaN
4 1 4.0 4.0 5.0 NaN
5 0 4.0 3.0 4.0 NaN
6 0 3.0 2.0 3.0 NaN
7 0 2.0 1.0 2.0 NaN
8 0 1.0 0.0 1.0 NaN
9 0 0.0 0.0 0.0 NaN
Issue Description
There seems to be some inconsistencies in the behavior of the rolling function related to the parameter ‘closed’. Different options were tested and assigned to a different column in the toy example above. First, when using ‘neither’ it returns NaNs as per the ‘neither’ column in the output. Second, when we use ‘right’ or ‘left’, the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed=‘both’, we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.
Expected Behavior
1- Using ‘neither’, should yield the same result as ‘left’ minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to ‘left’. 2- Using ‘left’ or ‘right’ should exclude from the calculation one of the endpoints and never take the whole window for calculation. 3- Using ‘both’, should use a window, exactly the same size as the parameter ‘window’, inclusive of the current row.
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Canada.1252 pandas : 1.3.4 numpy : 1.19.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.3.1 setuptools : 58.2.0 Cython : 0.29.24 pytest : 6.2.5 hypothesis : 6.23.2 sphinx : 4.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.29.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 4.0.1 pyxlsb : None s3fs : None scipy : 1.7.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.54.1
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Description and title updated.
This is not a bug but the documentation could be clearer, particularly the documentation for
Series.rolling
andDataFrame.rolling
. The user guide states:I like to think that a fixed window always consists of the current row plus
window
rows before the current row and that the first or the last (current) row might be excluded according to theclosed
parameter. Ifclosed
is set to'both'
then no rows will be excluded which means thatwindow+1
rows will be used for the result,window
rows are used for'left'
and'right'
, andwindow-1
rows are used whenclosed
is set to'neither'
.The
'neither'
column in the example resulted in allNaN
values because themin_periods
parameter is automatically set towindow
for fixed windows. To avoid this problemmin_periods
has to be set to a value lower thanwindow
wheneverclosed='neither'
.