Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fitting TransformedTargetRegressor with sample_weight in Pipeline

See original GitHub issue

Description

Can’t fit a TransformedTargetRegressor using sample_weight. May be link to #10945 ?

Steps/Code to Reproduce

Example:

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler, OneHotEncoder
from sklearn.compose import TransformedTargetRegressor, ColumnTransformer, make_column_transformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

# Create dataset
X, y = make_regression(n_samples=10000, noise=100, n_features=10, random_state=2019)
y = np.exp((y + abs(y.min())) / 200)
w = np.random.randn(len(X))
cat_list = ['AA', 'BB', 'CC', 'DD']
cat = np.random.choice(cat_list, len(X), p=[0.3, 0.2, 0.2, 0.3])

df = pd.DataFrame(X, columns=["col_" + str(i) for i in range(1, 11)])
df['sample_weight'] = w
df['my_caterogy'] = cat
df.head()

use_col = [col for col in df.columns if col not in ['sample_weight']]


numerical_features = df[use_col].dtypes == 'float'
categorical_features = ~numerical_features

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocess = make_column_transformer(
                                    (RobustScaler(), numerical_features),
                                    (OneHotEncoder(sparse=False), categorical_features)
)

rf = RandomForestRegressor(n_estimators=20)

clf = Pipeline(steps=[
                      ('preprocess', preprocess),
                      ('model', rf)
])

clf_trans = TransformedTargetRegressor(regressor=clf,
                                        func=np.log1p,
                                        inverse_func=np.expm1)

# Work
clf_trans.fit(df[use_col], y)

# Fail
clf_trans.fit(df[use_col], y, sample_weight=df['sample_weight'])

Expected Results

Fitting with sample_weight

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-366d815659ba> in <module>()
----> 1 clf_trans.fit(df[use_col], y, sample_weight=df['sample_weight'])

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/compose/_target.py in fit(self, X, y, sample_weight)
    194             self.regressor_.fit(X, y_trans)
    195         else:
--> 196             self.regressor_.fit(X, y_trans, sample_weight=sample_weight)
    197 
    198         return self

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    263             This estimator
    264         """
--> 265         Xt, fit_params = self._fit(X, y, **fit_params)
    266         if self._final_estimator is not None:
    267             self._final_estimator.fit(Xt, y, **fit_params)

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
    200                                 if step is not None)
    201         for pname, pval in six.iteritems(fit_params):
--> 202             step, param = pname.split('__', 1)
    203             fit_params_steps[step][param] = pval
    204         Xt = X

ValueError: not enough values to unpack (expected 2, got 1)

Versions

import sklearn; sklearn.show_versions()
System:
   machine: Linux-4.4.0-127-generic-x86_64-with-debian-stretch-sid
executable: /home/gillesa/anaconda3/envs/test_env/bin/python
    python: 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56)  [GCC 7.3.0]

BLAS:
cblas_libs: cblas
  lib_dirs: 
    macros: 

Python deps:
   sklearn: 0.20.2
    pandas: 0.24.1
       pip: 19.0.1
setuptools: 40.2.0
     numpy: 1.16.1
    Cython: None
     scipy: 1.2.0

Issue Analytics

State:
Created 5 years ago
Comments:17 (15 by maintainers)

Top GitHub Comments

4reactions

jnothmancommented, Mar 2, 2019

You’re right. we don’t yet seem to properly support fit parameters in TransformedTargetRegressor. And perhaps we should…

1reaction

stefan-matcovicicommented, Aug 26, 2019

Cool, I’ll give it a try then

Top Results From Across the Web

sklearn.compose.TransformedTargetRegressor

TransformedTargetRegressor : Poisson regression and non-normal loss Poisson ... This regressor will automatically be cloned each time prior to fitting.

How to use Custom Sklearn Classes and Pipelines

We start by defining a class that inherits from TransformerMixin which gives us the fit_transform method if we define the fit and transform ......

Is it possible to add TransformedTargetRegressor into a scikit ...

No, because the scikit-learn original Pipeline does not change the y or the number of samples in X and y during the steps....

Pipeline — Version 0.10.0 - Imbalanced-Learn

The final estimator only needs to implement fit. The transformers and samplers in the pipeline can be cached using memory argument. The purpose...

Python Examples of sklearn.linear_model.Lasso

regr.fit, X, y) # fit with sample_weight with a regressor which does not support it sample_weight ... regr = TransformedTargetRegressor(regressor=Lasso(), ...