BUG: Rolling count aggregation produces unexpected results
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
>>> pd.__version__
'1.1.0.dev0+1690.g70d7c04ff'
>>> pd.Series([1,1,1,None]).rolling(2, min_periods=2, center=True).count()
0 NaN
1 2.0
2 2.0
3 1.0
dtype: float64
Problem description
Apologies if this is expected behavior and not a bug. I noticed that prior to version 1.0, rolling.count
ignored the min_periods
parameter, as discussed in https://github.com/pandas-dev/pandas/pull/30923. However I’m having trouble understanding this output. Shouldn’t the last element of the output be NaN
, given that the final window includes the last and second to last elements, of which only one is valid?
Expected Output
0 NaN
1 2.0
2 2.0
3 NaN
dtype: float64
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 70d7c04ff585de361622e4fe1788480a7a4526b5 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.1.0.dev0+1690.g70d7c04ff numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.4.0.post20200518 Cython : 0.29.17 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (7 by maintainers)
Top GitHub Comments
Having a value means not-missing, e.g, not null.
I’m in favor of consistency with other methods. I understand that
… but counting non-NA values, is as easy as setting
min_periods=0
, right? Vice-versa, counting the number of non-NA values whenever there are at least a minimum number of non-NA values is the goal ofmin_periods
, and it is more annoying to do otherwise.