Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add rolling window to sklearn.model_selection.TimeSeriesSplit

See original GitHub issue

Describe the workflow you want to enable

I wanted to ask whether any plans exist to implement a rolling/sliding window method in the TimeSeriesSplit class:

68747470733a2f2f692e6962622e636f2f4b576b665137712f6e6577706c6f742e706e67

Currently, we are limited to using the expanding window type. For many financial time series models where a feature experiences a structural break, having a model whose weights are trained on the entire history can prove suboptimal.

I noted in https://github.com/scikit-learn/scikit-learn/pull/13204, specifically svenstehle’s comments, that this might be on the horizon?

Describe your proposed solution

Current Implementation

>>> x = np.arange(15)
>>> cv = TimeSeriesSplit(n_splits=3, gap=2)
>>> for train_index, test_index in cv.split(x):
...      print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2 3] TEST: [6 7 8]
TRAIN: [0 1 2 3 4 5 6] TEST: [ 9 10 11]
TRAIN: [0 1 2 3 4 5 6 7 8 9] TEST: [12 13 14]

Desired outcome

>>> x = np.arange(10)
>>> cv = TimeSeriesSplit(n_splits='walk_fw',  max_train_size=3, max_test_size=1)
>>> for train_index, test_index in cv.split(x):
...      print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2] TEST: [3]
TRAIN: [1 2 3] TEST: [ 4]
TRAIN: [2 3 4] TEST: [5]
TRAIN: [ 3 4 5] TEST: [6]
TRAIN: [ 4 5 6] TEST: [7]
TRAIN: [ 5 6 7] TEST: [8]
TRAIN: [ 6 7 8] TEST: [9]

Where the ‘stride’ of the walk forward is proportionate to the test set, or walks by the max_train_size parameter?

Describe alternatives you’ve considered, if relevant

No response

Additional context

No response

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:7 (5 by maintainers)

Top GitHub Comments

4reactions

thomasjpfancommented, Jun 21, 2022

The way I see it n_splits=‘walk_forward’ == n_splits=‘12’ and setting n_splits=12 should include all windows right?

n_split=12 only works when max_train_size=3, test_size=1 and X.shape[0]=15.

For example, if max_train_size=10 and test_size=2, then n_splits need to be set to 4 to properly walk forward:

x = np.arange(15)
cv = TimeSeriesSplit(n_splits=4, max_train_size=10,test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

n_split="walk_forward" would automatically compute the proper value of n_splits based on X.shape[0], test_size and max_train_size.

2reactions

thomasjpfancommented, Apr 14, 2022

Currently there is rolling window support, where the train set does not grow:

from sklearn.model_selection import TimeSeriesSplit

x = np.arange(15)
cv = TimeSeriesSplit(max_train_size=3, test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8  9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14]

If we want all the windows, we would need to adjust n_splits explicitly:

from sklearn.model_selection import TimeSeriesSplit

x = np.arange(15)
cv = TimeSeriesSplit(n_splits=12, max_train_size=3 ,test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

# TRAIN: [0 1 2] TEST: [3]
# TRAIN: [1 2 3] TEST: [4]
# TRAIN: [2 3 4] TEST: [5]
# TRAIN: [3 4 5] TEST: [6]
# TRAIN: [4 5 6] TEST: [7]
# TRAIN: [5 6 7] TEST: [8]
# TRAIN: [6 7 8] TEST: [9]
# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8  9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14]

Is the proposal to have n_splits='walk_fw' provide all the windows automatically?

Top Results From Across the Web

sklearn.model_selection.TimeSeriesSplit

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test...

Sliding window train/test split for time series data

You can do this manually fairly easy considering that your data set has 36 points. The following example should help:

scikit learn - time series forecasting - sliding window method

The way to escape sliding window is to use Recurrent Neural Networks but believe me, the method I suggested is worth it.

Time Series Split with Scikit-learn | by Keita Miyaki - Medium

from sklearn.model_selection import train_test_split ... Grid-search of hyper-parameter with TimeSeriesSplit requires some manual ... score.append([i,

3xLGBM + TimeSeriesSplit | Kaggle

... delayed from sklearn.model_selection import TimeSeriesSplit from sklearn ... def bvar(series): series = abs(series).rolling(window=2).apply(np.prod, ...