Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Rolling count aggregation produces unexpected results

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

>>> pd.__version__
'1.1.0.dev0+1690.g70d7c04ff'
>>> pd.Series([1,1,1,None]).rolling(2, min_periods=2, center=True).count()
0    NaN
1    2.0
2    2.0
3    1.0
dtype: float64

Problem description

Apologies if this is expected behavior and not a bug. I noticed that prior to version 1.0, rolling.count ignored the min_periods parameter, as discussed in https://github.com/pandas-dev/pandas/pull/30923. However I’m having trouble understanding this output. Shouldn’t the last element of the output be NaN, given that the final window includes the last and second to last elements, of which only one is valid?

Expected Output

0    NaN
1    2.0
2    2.0
3    NaN
dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 70d7c04ff585de361622e4fe1788480a7a4526b5 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1690.g70d7c04ff numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.4.0.post20200518 Cython : 0.29.17 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

State:
Created 3 years ago
Comments:14 (7 by maintainers)

Top GitHub Comments

1reaction

bashtagecommented, Sep 30, 2020

Minimum number of observations in window required to have a value

Having a value means not-missing, e.g, not null.

1reaction

toobazcommented, Sep 30, 2020

I’m in favor of consistency with other methods. I understand that

Keeping this behavior would give more useful results for users, especially considering count method is meant for counting non-NA values.

… but counting non-NA values, is as easy as setting min_periods=0, right? Vice-versa, counting the number of non-NA values whenever there are at least a minimum number of non-NA values is the goal of min_periods, and it is more annoying to do otherwise.