Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MultiOutputRegressor: Support for more fit parameters

See original GitHub issue

Description

This is a feature wanted. Till latest version of sklearn, the MultiOutputRegressor.fit only support a optional sample_weight parameter. It would be nice if it support another optional fit_param parameter, which will enhance the estimator.fit. For example, we can use lightgbm or xgboost early stopping fitting way to overcome the over-fitting issue.

I know it is a little bit complicated to realize that. But I I hope you will consider that. Thanks!

Steps/Code to Reproduce

This is my expected usage example.

#!/usr/bin/env python3

import numpy as np
from sklearn.multioutput import MultiOutputRegressor
import lightgbm as lgb

train_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
eval_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
single_model = lgb.GBMRegressor()
model = MultiOutputRegressor(single_model)
fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':(eval_X, eval_y)}
reg.fit(train_X, train_y, fit_param=fit_param)

Expected Results

Unsupported yet.

Actual Results

Unsupported yet.

Versions

Scikit-Learn: 0.22
pltaform: Windows-10-10.0.14393-SP0
python: 3.6.9

Issue Analytics

State:
Created 4 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

3reactions

alessio-cacommented, Oct 16, 2021

Hi! I believe the current implementation still does not support passing the eval_set for early stopping (at least for XGBoost). The problem is that the feature matrices and targets provided by eval_set are never propagated in the chain: the matrices are never augmented, and the targets (which are 2D matrices theirselves, since it’s a chain) are never split into single column vectors to be passed to the fit method.

For example:

import numpy as np
from sklearn.multioutput import RegressorChain
from xgboost import XGBRegressor

train_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
eval_X = np.random.random((10, 10))
eval_y = np.random.random((10, 4))

base_reg = XGBRegressor()
chain = RegressorChain(base_estimator=base_reg, order=[0, 1,2,3])

fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':[(eval_X, eval_y)]}
chain.fit(train_X, train_y, **fit_param)

Result:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/var/folders/zg/575k6939237_dd479ds5mxhr0000gn/T/ipykernel_30304/157940229.py in <module>
     12 
     13 fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':[(eval_X, eval_y)]}
---> 14 chain.fit(train_X, train_y, **fit_param)

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/sklearn/multioutput.py in fit(self, X, Y, **fit_params)
    857         self : object
    858         """
--> 859         super().fit(X, Y, **fit_params)
    860         return self
    861 

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/sklearn/multioutput.py in fit(self, X, Y, **fit_params)
    526             y = Y[:, self.order_[chain_idx]]
    527 
--> 528             estimator.fit(X_aug[:, : (X.shape[1] + chain_idx)], y, **fit_params)
    529             if self.cv is not None and chain_idx < len(self.estimators_) - 1:
    530                 col_idx = X.shape[1] + chain_idx

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
    709         evals_result = {}
    710 
--> 711         train_dmatrix, evals = _wrap_evaluation_matrices(
    712             missing=self.missing,
    713             X=X,

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in _wrap_evaluation_matrices(missing, X, y, group, qid, sample_weight, base_margin, feature_weights, eval_set, sample_weight_eval_set, base_margin_eval_set, eval_group, eval_qid, create_dmatrix, label_transform)
    279                 evals.append(train_dmatrix)
    280             else:
--> 281                 m = create_dmatrix(
    282                     data=valid_X,
    283                     label=label_transform(valid_y),

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in <lambda>(**kwargs)
    723             eval_group=None,
    724             eval_qid=None,
--> 725             create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
    726         )
    727         params = self.get_xgb_params()

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, group, qid, label_lower_bound, label_upper_bound, feature_weights, enable_categorical)
    547         self.handle = handle
    548 
--> 549         self.set_info(
    550             label=label,
    551             weight=weight,

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in set_info(self, label, weight, base_margin, group, qid, label_lower_bound, label_upper_bound, feature_names, feature_types, feature_weights)
    587 
    588         if label is not None:
--> 589             self.set_label(label)
    590         if weight is not None:
    591             self.set_weight(weight)

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in set_label(self, label)
    718         """
    719         from .data import dispatch_meta_backend
--> 720         dispatch_meta_backend(self, label, 'label', 'float')
    721 
    722     def set_weight(self, weight):

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/data.py in dispatch_meta_backend(matrix, data, name, dtype)
    693     """Dispatch for meta info."""
    694     handle = matrix.handle
--> 695     _validate_meta_shape(data)
    696     if data is None:
    697         return

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/data.py in _validate_meta_shape(data)
    637 def _validate_meta_shape(data):
    638     if hasattr(data, "shape"):
--> 639         assert len(data.shape) == 1 or (
    640             len(data.shape) == 2 and (data.shape[1] == 0 or data.shape[1] == 1)
    641         )

AssertionError:

As you can see, the target of eval_set (eval_y) is passed to XGBoost as a 2D matrix, which is not allowed. Even if you fix the problem for eval_y, the feature matrix of eval_set (eval_X) is not augmented when traversing the chain, raising an error as well in the next iteration.

Versions: XGBoost: 1.4.2 SKlearn: 0.24.2

0reactions

thomasjpfancommented, Apr 14, 2022

In this case, RegressorChain or MultiOutputRegressor does not know which fit parameters to slice.

To be fully generic, we would need to accept a process_fit_params callable parameter in RegressorChain or MultiOutputRegressor. During fit, the indices to slice on and fit_params is passed in and the callable returns the new fit params with the data correctly sliced.

Top Results From Across the Web

sklearn.multioutput.MultiOutputRegressor

This is a simple strategy for extending regressors that do not natively support multi-target regression. New in version 0.18. Parameters: estimatorestimator ...

How to Develop Multi-Output Regression Models with Python

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example.

Multi-output Regression Example with MultiOutputRegressor ...

In this tutorial, we'll learn how to fit and predict multioutput regression data with scikit-learn's MultiOutputRegressor class.

Multi-Output Regression using Sklearn - Python-bloggers

Regression analysis is a process of building a linear or non-linear fit for one or more continuous target variables.

Tutorial:Multi-Output Regression (skorch, tune) | Kaggle

Linear regression (Scikit-learn); Support vector machine (Scikit-learn) ... Besides, it is known that scaling is often more stable in parameter optimization ...