Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Support rolling over one level of a MultiIndex

See original GitHub issue

I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

I had a hard time understanding how df.rolling works when df is indexed by a MultiIndex

This is an example data frame:

import pandas as pd
idx = pd.MultiIndex.from_product(
    [pd.date_range("2020-01-01", "2020-1-10"), ["a", "b"]], names=["date", "obs"],
)
df = pd.DataFrame(index=idx)
df['c1'] = range(len(df))

print(df)

which outputs

                c1
date       obs    
2020-01-01 a     0
           b     1
2020-01-02 a     2
           b     3
2020-01-03 a     4
           b     5
2020-01-04 a     6
           b     7
2020-01-05 a     8
           b     9
2020-01-06 a    10
           b    11
2020-01-07 a    12
           b    13
2020-01-08 a    14
           b    15
2020-01-09 a    16
           b    17
2020-01-10 a    18
           b    19

Now I want to apply a rolling window on the date level, keeping the obs level separate.

I tried with no success obvious and simple (least surprise) commands like

df.rolling("7d", index="date") or
df.rolling("7d", on="date")

but finally the desired result is obtained by

df_r = df.groupby(by="obs", group_keys=False).rolling(
    "7d", on=df.index.levels[0]
).mean().sort_index()

print(df_r)

which gives me the correct result:

                  c1
date       obs      
2020-01-01 a     0.0
           b     1.0
2020-01-02 a     1.0
           b     2.0
2020-01-03 a     2.0
           b     3.0
2020-01-04 a     3.0
           b     4.0
2020-01-05 a     4.0
           b     5.0
2020-01-06 a     5.0
           b     6.0
2020-01-07 a     6.0
           b     7.0
2020-01-08 a     8.0
           b     9.0
2020-01-09 a    10.0
           b    11.0
2020-01-10 a    12.0
           b    13.0

It seams to me that this should be a quite common situation, so I was wondering if there is a simpler way to obtain the same results. By the way my solution is not very robust, because there are hidden assumptions on how the objects returned by groupby are indexed, which do not necessarily hold for a generic data frame.

Moreover the doc of the on parameter in rolling was almost incomprehensible to me: I’m still wondering if my usage rolling( "7d", on=df.index.levels[0]) is the intended one or not.

Issue Analytics

State:
Created 3 years ago
Reactions:5
Comments:8 (3 by maintainers)

Top GitHub Comments

4reactions

d-ottocommented, Dec 6, 2022

I’ve run into this issue as well.

The documentation for df.rolling() states on= should be: “a column label or Index level on which to calculate the rolling window”. My expectation was that I could pass the name of a multiindex level and .rolling() would group rows by unique index level values. This all might be better handled by .groupby(), but I’d love to see more integrated multiindex handling where convenient.

2reactions

miccolicommented, Jun 8, 2020

In @daskol example the level keyword seems a noop, as any other arg passed to rolling.

from pandas.testing import assert_frame_equal

assert_frame_equal(
    df.groupby(by="obs", group_keys=False).rolling(7).mean(),
    df.groupby(by="obs", group_keys=False).rolling(7, level=1).mean(),
)
assert_frame_equal(
    df.groupby(by="obs", group_keys=False).rolling(7).mean(),
    df.groupby(by="obs", group_keys=False).rolling(7, foo=8, bar='8').mean(),
)

I’m still very confused.

Top Results From Across the Web

using rolling functions on multi-index dataframe in pandas

I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to...

MultiIndex / advanced indexing — pandas 1.5.2 documentation

Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys. Or in other words, tuples go...

Pandas Multiindex Groupby Retaining Index Levels - ADocLib

Indexing on an integerbased Index with floats has been clarified in 0.18.0 for a summary of ... ENH: Support rolling over one level...

What's New — pandas 0.19.2 documentation

This is a minor bug-fix release in the 0.19.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend...

(PDF) Comparing Multi-Index Stochastic Collocation and Multi ...

For instance, in [1, 2, 3] the uncertainty quantiﬁcation (UQ) of a ... level/multi-index methods, due to its eﬀectiveness and solid ...