ENH: Support rolling over one level of a MultiIndex
See original GitHub issue-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
I had a hard time understanding how df.rolling
works when df
is indexed by a MultiIndex
This is an example data frame:
import pandas as pd
idx = pd.MultiIndex.from_product(
[pd.date_range("2020-01-01", "2020-1-10"), ["a", "b"]], names=["date", "obs"],
)
df = pd.DataFrame(index=idx)
df['c1'] = range(len(df))
print(df)
which outputs
c1
date obs
2020-01-01 a 0
b 1
2020-01-02 a 2
b 3
2020-01-03 a 4
b 5
2020-01-04 a 6
b 7
2020-01-05 a 8
b 9
2020-01-06 a 10
b 11
2020-01-07 a 12
b 13
2020-01-08 a 14
b 15
2020-01-09 a 16
b 17
2020-01-10 a 18
b 19
Now I want to apply a rolling window on the date
level, keeping the obs
level separate.
I tried with no success obvious and simple (least surprise) commands like
df.rolling("7d", index="date")
ordf.rolling("7d", on="date")
but finally the desired result is obtained by
df_r = df.groupby(by="obs", group_keys=False).rolling(
"7d", on=df.index.levels[0]
).mean().sort_index()
print(df_r)
which gives me the correct result:
c1
date obs
2020-01-01 a 0.0
b 1.0
2020-01-02 a 1.0
b 2.0
2020-01-03 a 2.0
b 3.0
2020-01-04 a 3.0
b 4.0
2020-01-05 a 4.0
b 5.0
2020-01-06 a 5.0
b 6.0
2020-01-07 a 6.0
b 7.0
2020-01-08 a 8.0
b 9.0
2020-01-09 a 10.0
b 11.0
2020-01-10 a 12.0
b 13.0
It seams to me that this should be a quite common situation, so I was wondering if there is a simpler way to obtain the same results. By the way my solution is not very robust, because there are hidden assumptions on how the objects returned by groupby
are indexed, which do not necessarily hold for a generic data frame.
Moreover the doc of the on
parameter in rolling
was almost incomprehensible to me: I’m still wondering if my usage rolling( "7d", on=df.index.levels[0])
is the intended one or not.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:8 (3 by maintainers)
Top GitHub Comments
I’ve run into this issue as well.
The documentation for
df.rolling()
stateson=
should be: “a column label or Index level on which to calculate the rolling window”. My expectation was that I could pass the name of a multiindex level and.rolling()
would group rows by unique index level values. This all might be better handled by.groupby()
, but I’d love to see more integrated multiindex handling where convenient.In @daskol example the
level
keyword seems a noop, as any other arg passed torolling
.I’m still very confused.