[BUG] AutoARIMA not working with Temporal Cross Validation
See original GitHub issueDescribe the bug Would like the statistical models like ARIMA, ETS, etc to work with the Temporal Cross Validation Flow. I am not able to reproduce the same results as the standalone ARIMA when using the Temporal Cross Validation Flow.
To Reproduce
Setup
y = load_airline()
y_train, y_test = temporal_train_test_split(y, test_size=36)
fh = ForecastingHorizon(np.arange(len(y_test)) + 1, is_relative=True)
Regular AutoARIMA (Baseline for test)
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"]);
smape_loss(y_test, y_pred)
0.04117062370076287
Using Temporal Cross Validation
Version 1: ‘sp’ in forecaster_param_grid
(does not work)
forecaster_param_grid = {'sp': [12]}
forecaster = AutoARIMA(suppress_warnings=True)
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.90), start_with_window=True)
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=forecaster_param_grid, verbose=True)
gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"]);
smape_loss(y_test, y_pred)
0.11346208431398466
Version 2: ‘sp’ in forecaster
(works but defeats the purpose of Grid Search)
forecaster_param_grid = {}
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.90), start_with_window=True)
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=forecaster_param_grid, verbose=True)
gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"]);
smape_loss(y_test, y_pred)
0.04117062370076287 (matches standalone AutoARIMA without GridSearch above)
Expected behavior
It does not look like the best_estimator is taking the seasonality value of 12 when ‘sp’ is just passed through the forecaster_param_grid
. It only works if it is set natively in the forecaster initialization.
Additional context Basically, I would like to create a unified flow around sktime to build and compare multiple models (ARIMA, ETS, Random Forest, SVM, etc), including hyper parameter parameter for the statistical models. I see from the examples folder how this can be done for native scikit models but wanted to recreate the same for the statistical models
Versions
System: python: 3.6.12 |Anaconda, Inc.| (default, Sep 9 2020, 00:29:25) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\xxxx\AppData\Local\Continuum\anaconda3\envs\sktime\python.exe machine: Windows-10-10.0.18362-SP0
Python dependencies: pip: 20.3 setuptools: 49.6.0 sklearn: 0.23.2 numpy: 1.19.2 scipy: 1.5.2 Cython: 0.29.17 pandas: 1.1.3 matplotlib: 3.3.2 joblib: 0.17.0 numba: None pmdarima: 1.7.1 tsfresh: None
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Hi @mloning, Thanks you for suggesting the alternative. You will need to change the argument assignments in
_AutoARIMA
from just <param_name> to self.<param_name> and then it will pick the updated param value after assignment is done in GC. It is definitely cleaner since we don’t have to rename the parameters such assp
.Would you like me to submit a PR for this?
Hi @mloning, Thanks for the reference to the repo! I will go through it and let you know if I have any further questions.