question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Time-based rolling apply inconsistent behavior on NaN rows

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

s = pd.Series([1, 2, 3, np.nan, 5], index=pd.date_range('2017-01-01', '2017-01-05'))
print(s.rolling('3d', min_periods=1).apply(lambda x: 42))

Problem description

Output of the above code sample:

2017-01-01    42.0
2017-01-02    42.0
2017-01-03    42.0
2017-01-04     NaN
2017-01-05    42.0

It seems that the user function did not get applied to the window corresponding to the original NaN row, resulting in NaN as the result for that row. Why is that? The more reasonable output would be all 42 because by the min_periods constraint, all of the windows are valid.

Compare this to the equivalent fixed window version:

s = pd.Series([1, 2, 3, np.nan, 5])
print(s.rolling(3, min_periods=1).apply(lambda x: 42))

which gives the following output:

0    42.0
1    42.0
2    42.0
3    42.0
4    42.0

Is this inconsistency between fixed and variable size window a desired behavior?

Expected Output

2017-01-01    42.0
2017-01-02    42.0
2017-01-03    42.0
2017-01-04    42.0
2017-01-05    42.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 4.11.9-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.21.0.dev+316.gf2b0bdc9b pytest: None pip: 9.0.1 setuptools: 36.2.5 Cython: 0.26 numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: 5.4.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
gfyoungcommented, Aug 22, 2017

@YS-L : No worries! We don’t expect everyone to pull master and build pandas from source, though since you’re an Arch user, I guess I would expect that from you 😉

But actually, thanks for taking the time to do this. We’re happy to see that we anticipated your problem 😄

0reactions
YS-Lcommented, Aug 22, 2017

Yes, pulled master to confirm.

Thought I found a fix after some digging into window.pyx, just to discover that it had been applied there on master. Note to self: should always test on latest master.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does pandas rolling apply throw ValueError when used ...
It looks like it is trying to unpack the output into the columns of the dataframe instead of rows. The problem seems to...
Read more >
How to Convert a Time Series to a Supervised Learning ...
We can shift all the observations down by one time step by inserting one new row at the top. Because the new row...
Read more >
pandas.DataFrame.rolling — pandas 1.3.5 documentation
Provide rolling window calculations. Parameters. windowint, offset, or BaseIndexer subclass. Size of the moving window. This is the number of observations ...
Read more >
Time Series ⌛️Tutorial - Kaggle
Moving Average Smoothing is a naive and effective technique in time series ... To override this behaviour and include NA values, use skipna...
Read more >
What's New — pandas 0.23.4 documentation
In the future pandas will not coerce, and the values not compare equal to the ... Bug in creating a time-based rolling window...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found