Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Uniform handling of y_train in forecasting performance metrics

See original GitHub issue

Is your feature request related to a problem? Please describe. Some forecasting performance metrics require y_train, e.g. MASE. This adds some complication to higher-level functionality that expects a common interface for metrics, like the evaluate function or ForecastingGridSearchCV, and in unit testing (see #672).

Current problem This currently fails because y_train is not passed internally when calling scoring.

from sktime.forecasting.all import *
from sktime.forecasting.model_evaluation import evaluate

y = load_airline()
f = NaiveForecaster()
cv = SlidingWindowSplitter()
scoring = MASE()
out = evaluate(f, cv, y, scoring=scoring)

Possible solutions

Change interface for all performance metrics to optionally accept y_train, but only those that require it use it. This requires wrapping metrics from scikit-learn.
Add case distinctions in higher-level functionality to separately handle those metrics that require y_train and those that do not. This requires adding a requires_y_train attribute to metric classes.
Adapt metrics interface at run time to, making case distinctions inside adapter, exposing uniform interface to higher-order functionality (suggested by @fkiraly). This also requires adding a requires_y_train attribute to metric classes.

Describe the solution you’d like

from sktime.forecasting.all import *
from sktime.forecasting.model_evaluation import evaluate

y = load_airline()
fh = np.arange(1, 10)
y_train, y_test = temporal_train_test_split(y, fh=fh)
f = NaiveForecaster()
f.fit(y_train)
y_pred = f.predict(fh)

# uniform interface
scoring = MASE()
scoring.requires_y_train = True
scoring = check_scoring(scoring)
scoring(y_test, y_pred, y_train)
>>> 3.577770878609128

scoring = sMAPE()
scoring.requires_y_train = False
scoring = check_scoring(scoring)
scoring(y_test, y_pred, y_train)
>>> 0.1780237534499896

Here’s a rough implementation of the adapter-based solution:

class _MetricAdapter:
    """
    Adapter for performance metrics to uniformly handle 
    y_train requirement of some metrics.
    """

    def __init__(self, metric):
        # wrap metric object
        self.metric = metric
        
    def __call__(self, y_true, y_pred, y_train, *args, **kwargs):
        """Compute metric, uniformly handling those metrics that 
        require `y_train` and those that do not.
        """
        
        # if y_train is required, pass it on
        if self.metric.requires_y_train:
            return self.metric(y_true, y_pred, y_train, *args, **kwargs)
        
        # otherwise, ignore y_train
        else:
            return self.metric(y_true, y_pred, *args, **kwargs)     
        
    def __getattr__(self, attr):
        # delegate attribute queries to the wrapped metric object
        return getattr(self.metric, attr)

    def __repr___(self):
        return repr(self.metric)

    
def _adapt_scoring(scoring):
    """Helper function to adapt scoring to uniformly handle y_train requirement"""
    return MetricAdapter(scoring)


def check_scoring(scoring):
    """
    Validate `scoring` object.

    Parameters
    ----------
    scoring : object
        Callable metric object.
    
    Returns
    -------
    scoring : object 
        Validated `scoring` object, or sMAPE() if `scoring` is None.
    
    Raises
    ------
    TypeError
        If `scoring` is not a callable object.
    """
    from sktime.performance_metrics.forecasting import sMAPE
    from sktime.performance_metrics.forecasting._classes import MetricFunctionWrapper
    
    if scoring is None:
        return sMAPE()

    if not callable(scoring):
        raise TypeError("`scoring` must be a callable object")

    valid_base_class = MetricFunctionWrapper
    if not isinstance(scoring, valid_base_class):
        raise TypeError(f"`scoring` must inherit from `{valid_base_class.__name__}`")

    return _adapt_scoring(scoring)

Issue Analytics

State:
Created 3 years ago
Comments:16

Top GitHub Comments

1reaction

RNKuhnscommented, May 4, 2021

@mloning and @fkiraly I’ve started a draft PR (#858).

I’ll be updating how the performance_metric tests yet, but wanted to start the PR to get some initial feedback.

1reaction

RNKuhnscommented, May 4, 2021

@mloning No problem bringing it back up. I am with you in terms of not being as comfortable with the functions in _functions.py all requiring y_train.

Also thanks for the clarification on kwargs, it helps to hear about the background on these things so I can keep that in mind on future work! I definately get that we want to keep hyperparameters/config args in the constructor and the __call__ method. I had been thinking that ruled out kwargs in general.

If we are good to use kwargs for things that don’t fall into that hyper-parameter/confi args bucket, then I’m with @fkiraly and think we use that to catch y_train when it is passed to function.

Higher-level functionality like evaluate would just always pass y_train.

I’ll start implementing and can make any tweaks once I get a draft PR.

Top Results From Across the Web

Uniform handling of y_train in forecasting performance metrics

Some forecasting performance metrics require y_train, e.g. MASE. ... *args, **kwargs): """Compute metric, uniformly handling those metrics ...

Time Series Analysis for Business Forecasting

Performance measure (or indicator, or objective): Measuring business performance is the top priority for managers. Management by objective works if you know ...

Evaluating time series forecasting models: an empirical study ...

Abstract. Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. This process is a fundamental ...

A Suite of Metrics for Assessing the ... - ScienceDirect.com

solar forecasts with uniform forecasting improvements, ... these two approaches by using both physical and historical data as inputs to train statistical.

Time Series Forecast Error Metrics you should know

Percentage Error Metrics solve this problem. They are scale independent and used to compare forecast performance between different time ...