question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add rolling window to sklearn.model_selection.TimeSeriesSplit

See original GitHub issue

Describe the workflow you want to enable

I wanted to ask whether any plans exist to implement a rolling/sliding window method in the TimeSeriesSplit class:

68747470733a2f2f692e6962622e636f2f4b576b665137712f6e6577706c6f742e706e67

Currently, we are limited to using the expanding window type. For many financial time series models where a feature experiences a structural break, having a model whose weights are trained on the entire history can prove suboptimal.

I noted in https://github.com/scikit-learn/scikit-learn/pull/13204, specifically svenstehle’s comments, that this might be on the horizon?

Describe your proposed solution

Current Implementation

>>> x = np.arange(15)
>>> cv = TimeSeriesSplit(n_splits=3, gap=2)
>>> for train_index, test_index in cv.split(x):
...      print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2 3] TEST: [6 7 8]
TRAIN: [0 1 2 3 4 5 6] TEST: [ 9 10 11]
TRAIN: [0 1 2 3 4 5 6 7 8 9] TEST: [12 13 14]

Desired outcome

>>> x = np.arange(10)
>>> cv = TimeSeriesSplit(n_splits='walk_fw',  max_train_size=3, max_test_size=1)
>>> for train_index, test_index in cv.split(x):
...      print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2] TEST: [3]
TRAIN: [1 2 3] TEST: [ 4]
TRAIN: [2 3 4] TEST: [5]
TRAIN: [ 3 4 5] TEST: [6]
TRAIN: [ 4 5 6] TEST: [7]
TRAIN: [ 5 6 7] TEST: [8]
TRAIN: [ 6 7 8] TEST: [9]

Where the ‘stride’ of the walk forward is proportionate to the test set, or walks by the max_train_size parameter?

Describe alternatives you’ve considered, if relevant

No response

Additional context

No response

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:4
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
thomasjpfancommented, Jun 21, 2022

The way I see it n_splits=‘walk_forward’ == n_splits=‘12’ and setting n_splits=12 should include all windows right?

n_split=12 only works when max_train_size=3, test_size=1 and X.shape[0]=15.

For example, if max_train_size=10 and test_size=2, then n_splits need to be set to 4 to properly walk forward:

x = np.arange(15)
cv = TimeSeriesSplit(n_splits=4, max_train_size=10,test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

n_split="walk_forward" would automatically compute the proper value of n_splits based on X.shape[0], test_size and max_train_size.

2reactions
thomasjpfancommented, Apr 14, 2022

Currently there is rolling window support, where the train set does not grow:

from sklearn.model_selection import TimeSeriesSplit

x = np.arange(15)
cv = TimeSeriesSplit(max_train_size=3, test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8  9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14]

If we want all the windows, we would need to adjust n_splits explicitly:

from sklearn.model_selection import TimeSeriesSplit

x = np.arange(15)
cv = TimeSeriesSplit(n_splits=12, max_train_size=3 ,test_size=1)
for train_index, test_index in cv.split(x):
    print("TRAIN:", train_index, "TEST:", test_index)

# TRAIN: [0 1 2] TEST: [3]
# TRAIN: [1 2 3] TEST: [4]
# TRAIN: [2 3 4] TEST: [5]
# TRAIN: [3 4 5] TEST: [6]
# TRAIN: [4 5 6] TEST: [7]
# TRAIN: [5 6 7] TEST: [8]
# TRAIN: [6 7 8] TEST: [9]
# TRAIN: [7 8 9] TEST: [10]
# TRAIN: [ 8  9 10] TEST: [11]
# TRAIN: [ 9 10 11] TEST: [12]
# TRAIN: [10 11 12] TEST: [13]
# TRAIN: [11 12 13] TEST: [14]

Is the proposal to have n_splits='walk_fw' provide all the windows automatically?

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.model_selection.TimeSeriesSplit
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test...
Read more >
Sliding window train/test split for time series data
You can do this manually fairly easy considering that your data set has 36 points. The following example should help:
Read more >
scikit learn - time series forecasting - sliding window method
The way to escape sliding window is to use Recurrent Neural Networks but believe me, the method I suggested is worth it.
Read more >
Time Series Split with Scikit-learn | by Keita Miyaki - Medium
from sklearn.model_selection import train_test_split ... Grid-search of hyper-parameter with TimeSeriesSplit requires some manual ... score.append([i,
Read more >
3xLGBM + TimeSeriesSplit | Kaggle
... delayed from sklearn.model_selection import TimeSeriesSplit from sklearn ... def bvar(series): series = abs(series).rolling(window=2).apply(np.prod, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found