[BUG] Wrong Forecast calculation on freq values not 1
See original GitHub issueDescribe the bug Hi! I have just come upon sktime and performing a quick test on ExponentialSmoothing with 15 minute data (freq=“15T”) the prediction raises an exception, however, when using 1-frequencies (1T, 1H) etc the program seems to work.
I think this is somehow related to issue #534.
To Reproduce
idx = pd.period_range("2021-01-01", "2021-01-02", freq="15T")
values = np.random.randint(1, 10, len(idx))
ts = pd.Series(values, index=idx)
y_train, y_test = temporal_train_test_split(ts, test_size=1/3)
fh = np.arange(len(y_test)) + 1
forecaster = ExponentialSmoothing(trend="add")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
KeyError: "None of [PeriodIndex(['2021-01-01 16:00', '2021-01-01 16:15', '2021-01-01 16:30',\n '2021-01-01 16:45', '2021-01-01 17:00', '2021-01-01 17:15',\n '2021-01-01 17:30', '2021-01-01 17:45', '2021-01-01 18:00',\n '2021-01-01 18:15', '2021-01-01 18:30', '2021-01-01 18:45',\n '2021-01-01 19:00', '2021-01-01 19:15', '2021-01-01 19:30',\n '2021-01-01 19:45', '2021-01-01 20:00', '2021-01-01 20:15',\n '2021-01-01 20:30', '2021-01-01 20:45', '2021-01-01 21:00',\n '2021-01-01 21:15', '2021-01-01 21:30', '2021-01-01 21:45',\n '2021-01-01 22:00', '2021-01-01 22:15', '2021-01-01 22:30',\n '2021-01-01 22:45', '2021-01-01 23:00', '2021-01-01 23:15',\n '2021-01-01 23:30', '2021-01-01 23:45', '2021-01-02 00:00'],\n dtype='period[15T]', freq='15T')] are in the [index]"
Expected behavior Prediction should work.
Additional context So debugging, the problem seems to be over here in _fh.py
# Note: We should here also coerce to periods for more reliable arithmetic
# operations as in `to_relative` but currently doesn't work with
# `update_predict` and incomplete time indices where the `freq` information
# is lost, see comment on issue #534
integers = absolute - start
if isinstance(absolute, (pd.PeriodIndex, pd.DatetimeIndex)):
integers = _coerce_duration_to_int(integers, freq=_get_freq(cutoff))
The function coerce_duration_to_int does not take into account the cutoff frequency on a PeriodIndex, as it does with TimeDeltas:
elif isinstance(duration, pd.Index) and isinstance(
duration[0], pd.tseries.offsets.BaseOffset
):
return pd.Int64Index([d.n for d in duration])
elif isinstance(duration, (pd.Timedelta, pd.TimedeltaIndex)):
if freq is None:
raise ValueError("`unit` missing")
As a result, integers
value is multiplied by 15 (in the example, for freq 15T
), and PeriodIndex “indexes” generated are far bigger and so the exception.
Versions
System: python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\XXX\AppData\Local\Programs\Python\Python37\python.exe machine: Windows-10-10.0.19041-SP0 Python dependencies: pip: 20.1.1 setuptools: 41.2.0 sklearn: 0.24.1 sktime: 0.5.3 statsmodels: 0.12.2 numpy: 1.20.1 scipy: 1.3.3 Cython: 0.29.17 pandas: 1.2.2 matplotlib: 3.1.3 joblib: 0.14.0 numba: 0.52.0 pmdarima: 1.8.0 tsfresh: None
Issue Analytics
- State:
- Created 3 years ago
- Comments:8
I did not have time to look into this yet…
For now, let me just share with you a temporary patch that I have used in the past. It ain’t great, but it works… 👀
This is a transformer that changes the series index to a unit frequency. The default is
"T"
(minute) but it doesn’t matter what frequency you use since the only thing that matters for the sktime internals is that freq is unitary. This transformer can be used as the first step in a pipeline. The end results will then be transformed back to the original freq. Unfortunately, this approach only works when using relative forecasting horizons.Yes, thanks for pointing this out.