Add rolling window to sklearn.model_selection.TimeSeriesSplit
See original GitHub issueDescribe the workflow you want to enable
I wanted to ask whether any plans exist to implement a rolling/sliding window method in the TimeSeriesSplit class:
Currently, we are limited to using the expanding window type. For many financial time series models where a feature experiences a structural break, having a model whose weights are trained on the entire history can prove suboptimal.
I noted in https://github.com/scikit-learn/scikit-learn/pull/13204, specifically svenstehle’s comments, that this might be on the horizon?
Describe your proposed solution
Current Implementation
>>> x = np.arange(15)
>>> cv = TimeSeriesSplit(n_splits=3, gap=2)
>>> for train_index, test_index in cv.split(x):
... print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2 3] TEST: [6 7 8]
TRAIN: [0 1 2 3 4 5 6] TEST: [ 9 10 11]
TRAIN: [0 1 2 3 4 5 6 7 8 9] TEST: [12 13 14]
Desired outcome
>>> x = np.arange(10)
>>> cv = TimeSeriesSplit(n_splits='walk_fw', max_train_size=3, max_test_size=1)
>>> for train_index, test_index in cv.split(x):
... print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [0 1 2] TEST: [3]
TRAIN: [1 2 3] TEST: [ 4]
TRAIN: [2 3 4] TEST: [5]
TRAIN: [ 3 4 5] TEST: [6]
TRAIN: [ 4 5 6] TEST: [7]
TRAIN: [ 5 6 7] TEST: [8]
TRAIN: [ 6 7 8] TEST: [9]
Where the ‘stride’ of the walk forward is proportionate to the test set, or walks by the max_train_size parameter?
Describe alternatives you’ve considered, if relevant
No response
Additional context
No response
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:7 (5 by maintainers)
Top Results From Across the Web
sklearn.model_selection.TimeSeriesSplit
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test...
Read more >Sliding window train/test split for time series data
You can do this manually fairly easy considering that your data set has 36 points. The following example should help:
Read more >scikit learn - time series forecasting - sliding window method
The way to escape sliding window is to use Recurrent Neural Networks but believe me, the method I suggested is worth it.
Read more >Time Series Split with Scikit-learn | by Keita Miyaki - Medium
from sklearn.model_selection import train_test_split ... Grid-search of hyper-parameter with TimeSeriesSplit requires some manual ... score.append([i,
Read more >3xLGBM + TimeSeriesSplit | Kaggle
... delayed from sklearn.model_selection import TimeSeriesSplit from sklearn ... def bvar(series): series = abs(series).rolling(window=2).apply(np.prod, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
n_split=12
only works whenmax_train_size=3
,test_size=1
andX.shape[0]=15
.For example, if
max_train_size=10
andtest_size=2
, thenn_splits
need to be set to 4 to properly walk forward:n_split="walk_forward"
would automatically compute the proper value ofn_splits
based onX.shape[0]
,test_size
andmax_train_size
.Currently there is rolling window support, where the train set does not grow:
If we want all the windows, we would need to adjust
n_splits
explicitly:Is the proposal to have
n_splits='walk_fw'
provide all the windows automatically?