Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement function for model evaluation in forecasting

See original GitHub issue

Is your feature request related to a problem? Please describe. https://github.com/sktime/enhancement-proposals/pull/8/ #622 #64

Describe the solution you’d like A function to evaluate a forecaster, something along the following lines:

 def evaluate(forecaster, y, fh, X=None, cv=None, strategy="refit", scoring=None):
     """Evaluate forecaster using cross-validation"""
     
     # check cv, compatibility with fh
     # check strategy, e.g. assert strategy in ("refit", "update"), compatibility with cv
     # check scoring
     
     # pre-allocate score array
     n_splits = cv.get_n_splits(y)
     scores = np.empty(n_splits)
     
     for i, (train, test) in enumerate(cv.split(y)):
         # split data
         y_train = y.iloc[train]
         y_test = y.iloc[test]
         # split X too

         # fit and predict
         forecaster.fit(y_train, fh) # pass X too
         y_pred = forecaster.predict()

         # score
         scores[i] = scoring(y_test, y_pred)
     
     # return scores, possibly aggregate
     return scores

I’ll ping you @ViktorKaz as you’ve worked on this before.

Issue Analytics

State:
Created 3 years ago
Comments:5

Top GitHub Comments

2reactions

thomhomsmacommented, Feb 16, 2021

My struggle/issue is related to this topic so I will try to give the best possible explanation of what I would like to achieve with Sktime.

The struggle that I am facing is that in the forecasting tutorial of Sktime all examples forecast the full test set.

In my case: I would like to fit the auto ARIMA model on the train set (e.g. 80% of the data). Subsequently, I would like to cross-validate the performance of the model on the test set (final 20% of the data) with a sliding window. Since my forecast horizon is 16 steps, I would like to monitor the performance on the test set for every lead time and the mean performance over all the lead times in a standard error metric (or multiple standard error metrics illustrated in the figure). Since I am comparing the performance to deep learning models that do not use the test data to refit or update their parameters I would only like to fit the ARIMA model on the training data.

I have included two figures to illustrate what my intention is.

image (1)