question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Rolling count aggregation produces unexpected results

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


>>> pd.__version__
'1.1.0.dev0+1690.g70d7c04ff'
>>> pd.Series([1,1,1,None]).rolling(2, min_periods=2, center=True).count()
0    NaN
1    2.0
2    2.0
3    1.0
dtype: float64

Problem description

Apologies if this is expected behavior and not a bug. I noticed that prior to version 1.0, rolling.count ignored the min_periods parameter, as discussed in https://github.com/pandas-dev/pandas/pull/30923. However I’m having trouble understanding this output. Shouldn’t the last element of the output be NaN, given that the final window includes the last and second to last elements, of which only one is valid?

Expected Output

0    NaN
1    2.0
2    2.0
3    NaN
dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 70d7c04ff585de361622e4fe1788480a7a4526b5 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1690.g70d7c04ff numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.4.0.post20200518 Cython : 0.29.17 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
bashtagecommented, Sep 30, 2020

Minimum number of observations in window required to have a value

Having a value means not-missing, e.g, not null.

1reaction
toobazcommented, Sep 30, 2020

I’m in favor of consistency with other methods. I understand that

Keeping this behavior would give more useful results for users, especially considering count method is meant for counting non-NA values.

… but counting non-NA values, is as easy as setting min_periods=0, right? Vice-versa, counting the number of non-NA values whenever there are at least a minimum number of non-NA values is the goal of min_periods, and it is more annoying to do otherwise.

Read more comments on GitHub >

github_iconTop Results From Across the Web

MongoDB unwind aggregation query giving unexpected results
To see which documents in a users collection has links to a specific user I use the following aggregation query:
Read more >
How to add an aggregate result from a SOQL query inside a ...
I am trying to create a map of Opportunity ids and a count of its Opportunity Line Items.
Read more >
Bug Issues Known to cause Wrong Results - OracleBlog
DBBP12, 12.2.0.0, Wrong Results using aggregations of CASE expression with ... CASE expression may produce unexpected error / wrong results.
Read more >
Plansplaining, part 1. The unexpected aggregation and assert
The Stream Aggregate operator in this case does not have the Group By property. This means that it produces a scalar aggregate: a...
Read more >
Summarizing Data Dimensionally
If you use other types of summaries with dimensional style reports, you may encounter unexpected results. You can also add summary aggregation, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found