Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Wrong Forecast calculation on freq values not 1

See original GitHub issue

Describe the bug Hi! I have just come upon sktime and performing a quick test on ExponentialSmoothing with 15 minute data (freq=“15T”) the prediction raises an exception, however, when using 1-frequencies (1T, 1H) etc the program seems to work.

I think this is somehow related to issue #534.

To Reproduce

idx = pd.period_range("2021-01-01", "2021-01-02", freq="15T")
values = np.random.randint(1, 10, len(idx))
ts = pd.Series(values, index=idx)
y_train, y_test = temporal_train_test_split(ts, test_size=1/3)
fh = np.arange(len(y_test)) + 1
forecaster = ExponentialSmoothing(trend="add")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)

KeyError: "None of [PeriodIndex(['2021-01-01 16:00', '2021-01-01 16:15', '2021-01-01 16:30',\n             '2021-01-01 16:45', '2021-01-01 17:00', '2021-01-01 17:15',\n             '2021-01-01 17:30', '2021-01-01 17:45', '2021-01-01 18:00',\n             '2021-01-01 18:15', '2021-01-01 18:30', '2021-01-01 18:45',\n             '2021-01-01 19:00', '2021-01-01 19:15', '2021-01-01 19:30',\n             '2021-01-01 19:45', '2021-01-01 20:00', '2021-01-01 20:15',\n             '2021-01-01 20:30', '2021-01-01 20:45', '2021-01-01 21:00',\n             '2021-01-01 21:15', '2021-01-01 21:30', '2021-01-01 21:45',\n             '2021-01-01 22:00', '2021-01-01 22:15', '2021-01-01 22:30',\n             '2021-01-01 22:45', '2021-01-01 23:00', '2021-01-01 23:15',\n             '2021-01-01 23:30', '2021-01-01 23:45', '2021-01-02 00:00'],\n            dtype='period[15T]', freq='15T')] are in the [index]"

Expected behavior Prediction should work.

Additional context So debugging, the problem seems to be over here in _fh.py

        # Note: We should here also coerce to periods for more reliable arithmetic
        # operations as in `to_relative` but currently doesn't work with
        # `update_predict` and incomplete time indices where the `freq` information
        # is lost, see comment on issue #534
        integers = absolute - start

        if isinstance(absolute, (pd.PeriodIndex, pd.DatetimeIndex)):
            integers = _coerce_duration_to_int(integers, freq=_get_freq(cutoff))

The function coerce_duration_to_int does not take into account the cutoff frequency on a PeriodIndex, as it does with TimeDeltas:

    elif isinstance(duration, pd.Index) and isinstance(
        duration[0], pd.tseries.offsets.BaseOffset
    ):
        return pd.Int64Index([d.n for d in duration])
    elif isinstance(duration, (pd.Timedelta, pd.TimedeltaIndex)):
        if freq is None:
            raise ValueError("`unit` missing")

As a result, integers value is multiplied by 15 (in the example, for freq 15T), and PeriodIndex “indexes” generated are far bigger and so the exception.

Versions

System: python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\XXX\AppData\Local\Programs\Python\Python37\python.exe machine: Windows-10-10.0.19041-SP0 Python dependencies: pip: 20.1.1 setuptools: 41.2.0 sklearn: 0.24.1 sktime: 0.5.3 statsmodels: 0.12.2 numpy: 1.20.1 scipy: 1.3.3 Cython: 0.29.17 pandas: 1.2.2 matplotlib: 3.1.3 joblib: 0.14.0 numba: 0.52.0 pmdarima: 1.8.0 tsfresh: None

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

1reaction

tpvasconceloscommented, Mar 15, 2021

I did not have time to look into this yet…

For now, let me just share with you a temporary patch that I have used in the past. It ain’t great, but it works… 👀

import pandas as pd
from sktime.transformations.base import _SeriesToSeriesTransformer
from sktime.utils.validation.series import check_series


def _transform_freq(Z, freq):
    Z_index = Z.sort_index(ascending=True).index
    return pd.date_range(start=Z_index[0], periods=Z_index.size, freq=freq)


class Hotfix670(_SeriesToSeriesTransformer):

    def __init__(self, freq="T") -> None:
        super(Hotfix670).__init__()
        self.freq = freq
        self.freq_original = None

    def transform(self, Z, X=None):
        self.check_is_fitted()
        Z = check_series(Z).copy()
        self.freq_original = Z.index.freqstr
        Z.index = _transform_freq(Z=Z, freq=self.freq)
        return Z

    def inverse_transform(self, Z, X=None):
        self.check_is_fitted()
        Z = check_series(Z).copy()
        Z.index = _transform_freq(Z=Z, freq=self.freq_original)
        return Z

This is a transformer that changes the series index to a unit frequency. The default is "T" (minute) but it doesn’t matter what frequency you use since the only thing that matters for the sktime internals is that freq is unitary. This transformer can be used as the first step in a pipeline. The end results will then be transformed back to the original freq. Unfortunately, this approach only works when using relative forecasting horizons.

0reactions

fkiralycommented, Feb 23, 2022

Yes, thanks for pointing this out.

Top Results From Across the Web

[BUG] Wrong Forecast calculation on freq values not 1 #670

Hi! I have just come upon sktime and performing a quick test on ExponentialSmoothing with 15 minute data (freq="15T") the prediction raises an...

sktime ARIMA invalid frequency - python - Stack Overflow

I get a different error: ValueError: ``unit`` missing , possibly due to version difference. Anyhow, I'd say it is better to have your ......

Measuring Forecast Accuracy: The Complete Guide

Despite its name, forecast bias measures accuracy, meaning that the target level is 1 or 100% and the number +/- that is the...

Forecasting in statsmodels

By not re-estimating the parameters, our forecasts are slightly worse (the root mean square error is higher at each horizon). However, the process...

Forecast Verification Glossary A B

In a categorical verification problem, an event forecast that is associated with no event observed. See contingency table. false alarm ratio (FAR). A ......