question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Wrong Forecast calculation on freq values not 1

See original GitHub issue

Describe the bug Hi! I have just come upon sktime and performing a quick test on ExponentialSmoothing with 15 minute data (freq=“15T”) the prediction raises an exception, however, when using 1-frequencies (1T, 1H) etc the program seems to work.

I think this is somehow related to issue #534.

To Reproduce

idx = pd.period_range("2021-01-01", "2021-01-02", freq="15T")
values = np.random.randint(1, 10, len(idx))
ts = pd.Series(values, index=idx)
y_train, y_test = temporal_train_test_split(ts, test_size=1/3)
fh = np.arange(len(y_test)) + 1
forecaster = ExponentialSmoothing(trend="add")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)

KeyError: "None of [PeriodIndex(['2021-01-01 16:00', '2021-01-01 16:15', '2021-01-01 16:30',\n             '2021-01-01 16:45', '2021-01-01 17:00', '2021-01-01 17:15',\n             '2021-01-01 17:30', '2021-01-01 17:45', '2021-01-01 18:00',\n             '2021-01-01 18:15', '2021-01-01 18:30', '2021-01-01 18:45',\n             '2021-01-01 19:00', '2021-01-01 19:15', '2021-01-01 19:30',\n             '2021-01-01 19:45', '2021-01-01 20:00', '2021-01-01 20:15',\n             '2021-01-01 20:30', '2021-01-01 20:45', '2021-01-01 21:00',\n             '2021-01-01 21:15', '2021-01-01 21:30', '2021-01-01 21:45',\n             '2021-01-01 22:00', '2021-01-01 22:15', '2021-01-01 22:30',\n             '2021-01-01 22:45', '2021-01-01 23:00', '2021-01-01 23:15',\n             '2021-01-01 23:30', '2021-01-01 23:45', '2021-01-02 00:00'],\n            dtype='period[15T]', freq='15T')] are in the [index]"

Expected behavior Prediction should work.

Additional context So debugging, the problem seems to be over here in _fh.py

        # Note: We should here also coerce to periods for more reliable arithmetic
        # operations as in `to_relative` but currently doesn't work with
        # `update_predict` and incomplete time indices where the `freq` information
        # is lost, see comment on issue #534
        integers = absolute - start

        if isinstance(absolute, (pd.PeriodIndex, pd.DatetimeIndex)):
            integers = _coerce_duration_to_int(integers, freq=_get_freq(cutoff))

The function coerce_duration_to_int does not take into account the cutoff frequency on a PeriodIndex, as it does with TimeDeltas:

    elif isinstance(duration, pd.Index) and isinstance(
        duration[0], pd.tseries.offsets.BaseOffset
    ):
        return pd.Int64Index([d.n for d in duration])
    elif isinstance(duration, (pd.Timedelta, pd.TimedeltaIndex)):
        if freq is None:
            raise ValueError("`unit` missing")

As a result, integers value is multiplied by 15 (in the example, for freq 15T), and PeriodIndex “indexes” generated are far bigger and so the exception.

Versions

System: python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\XXX\AppData\Local\Programs\Python\Python37\python.exe machine: Windows-10-10.0.19041-SP0 Python dependencies: pip: 20.1.1 setuptools: 41.2.0 sklearn: 0.24.1 sktime: 0.5.3 statsmodels: 0.12.2 numpy: 1.20.1 scipy: 1.3.3 Cython: 0.29.17 pandas: 1.2.2 matplotlib: 3.1.3 joblib: 0.14.0 numba: 0.52.0 pmdarima: 1.8.0 tsfresh: None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
tpvasconceloscommented, Mar 15, 2021

I did not have time to look into this yet…

For now, let me just share with you a temporary patch that I have used in the past. It ain’t great, but it works… 👀

import pandas as pd
from sktime.transformations.base import _SeriesToSeriesTransformer
from sktime.utils.validation.series import check_series


def _transform_freq(Z, freq):
    Z_index = Z.sort_index(ascending=True).index
    return pd.date_range(start=Z_index[0], periods=Z_index.size, freq=freq)


class Hotfix670(_SeriesToSeriesTransformer):

    def __init__(self, freq="T") -> None:
        super(Hotfix670).__init__()
        self.freq = freq
        self.freq_original = None

    def transform(self, Z, X=None):
        self.check_is_fitted()
        Z = check_series(Z).copy()
        self.freq_original = Z.index.freqstr
        Z.index = _transform_freq(Z=Z, freq=self.freq)
        return Z

    def inverse_transform(self, Z, X=None):
        self.check_is_fitted()
        Z = check_series(Z).copy()
        Z.index = _transform_freq(Z=Z, freq=self.freq_original)
        return Z

This is a transformer that changes the series index to a unit frequency. The default is "T" (minute) but it doesn’t matter what frequency you use since the only thing that matters for the sktime internals is that freq is unitary. This transformer can be used as the first step in a pipeline. The end results will then be transformed back to the original freq. Unfortunately, this approach only works when using relative forecasting horizons.

0reactions
fkiralycommented, Feb 23, 2022

Yes, thanks for pointing this out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Wrong Forecast calculation on freq values not 1 #670
Hi! I have just come upon sktime and performing a quick test on ExponentialSmoothing with 15 minute data (freq="15T") the prediction raises an...
Read more >
sktime ARIMA invalid frequency - python - Stack Overflow
I get a different error: ValueError: ``unit`` missing , possibly due to version difference. Anyhow, I'd say it is better to have your ......
Read more >
Measuring Forecast Accuracy: The Complete Guide
Despite its name, forecast bias measures accuracy, meaning that the target level is 1 or 100% and the number +/- that is the...
Read more >
Forecasting in statsmodels
By not re-estimating the parameters, our forecasts are slightly worse (the root mean square error is higher at each horizon). However, the process...
Read more >
Forecast Verification Glossary A B
In a categorical verification problem, an event forecast that is associated with no event observed. See contingency table. false alarm ratio (FAR). A ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found