Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for regression by classification

See original GitHub issue

My team and I are working on an application of regression by classification, a technique described in this article.

In a nut shell regression by classification is approaching a regression problem with multi-class classification algorithms. The key part of this technique is to perform discretization, or binning, of the (continous) target prior to classification. The article mentions 3 different approaches for target discretization which are supported by sklearn’s KBinsDiscretizer.

Equally probable interval (this is the quantile strategy of KBinsDiscretizer)
Equal width interval (this is the uniform strategy of KBinsDiscretizer)
K-means clustering (this is the kmeans strategy of KBinsDiscretizer)

In regression by classification, the choice of the numbers of classes, the n_bins parameter, is critical. One straight forward way to tune this parameter and to choose the binning strategy is to use cross-validation. But because transformations on y (see #4143) are currently forbidden in scikit-learn, this is not “natively” supported.

We found a way around this by creating our own meta-estimator, as suggested by @jnothman elsewhere. But one problem remained. How can we tell scikit-learn to compute evaluation metrics on BINNED targets, and not the original CONTINOUS targets?

We achieved this by hacking the _PredictScorer class on our scikit-learn fork. The hack looks for a special custom method called get_transformed_targets on our home-brewed meta-estimator. If this method is present, the score is computed using transformed (binned) targets. Here is the hack:

class _PredictScorer(_BaseScorer):
    def _score(self, method_caller, estimator, X, y_true, sample_weight=None):
        """[... docstring ...]
        """
        #Here starts the hack
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        #Here ends the hack

        y_pred = method_caller(estimator, "predict", X)
        if sample_weight is not None:
            return self._sign * self._score_func(y_true, y_pred,
                                                 sample_weight=sample_weight,
                                                 **self._kwargs)
        else:
            return self._sign * self._score_func(y_true, y_pred,
                                                 **self._kwargs)

Another problem we encounter is to use the KBinsDiscretizer class on targets. We plan on doing this with a custom meta-transformer.

It would be nice if the regression by classification was supported by scikit-learn out of the box. Perhaps the re-sampling options coming soon will make this possible, but it will have to be tested.

Issue Analytics

State:
Created 4 years ago
Comments:25 (20 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, Dec 11, 2019

We try to encourage good practice, particularly around evaluation. Evaluating in the classification space does not tell you about how well you solved the regression problem. I think the API can make it possible, but should not make it too easy or the default.

0reactions

lorentzenchrcommented, Jun 9, 2022

As info: There is now a longer discussion here: https://stats.stackexchange.com/questions/565537/is-there-ever-a-reason-to-solve-a-regression-problem-as-a-classification-problem

Top Results From Across the Web

Regression for Classification | Hands on Experience

We all have developed numerous regression models in our lives. But only few are familiar with using regression models for classification.

Why not approach classification through regression?

Logistic regression predicts probabilities, and is therefore a regression algorithm. However, it is commonly described as a classification ...

Difference Between Classification and Regression in Machine ...

Classification is about predicting a label (e.g. 'red'). Regression is about predicting a quantity (e.g. 100). Does that help? Reply.

Regression vs. Classification in Machine Learning for Beginners

In this article, we examine regression versus classification in machine learning, including definitions, types, differences, and uses.

Regression vs Classification in Machine Learning - Javatpoint

Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with ...