question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fitting TransformedTargetRegressor with sample_weight in Pipeline

See original GitHub issue

Description

Can’t fit a TransformedTargetRegressor using sample_weight. May be link to #10945 ?

Steps/Code to Reproduce

Example:

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler, OneHotEncoder
from sklearn.compose import TransformedTargetRegressor, ColumnTransformer, make_column_transformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

# Create dataset
X, y = make_regression(n_samples=10000, noise=100, n_features=10, random_state=2019)
y = np.exp((y + abs(y.min())) / 200)
w = np.random.randn(len(X))
cat_list = ['AA', 'BB', 'CC', 'DD']
cat = np.random.choice(cat_list, len(X), p=[0.3, 0.2, 0.2, 0.3])

df = pd.DataFrame(X, columns=["col_" + str(i) for i in range(1, 11)])
df['sample_weight'] = w
df['my_caterogy'] = cat
df.head()

image

use_col = [col for col in df.columns if col not in ['sample_weight']]


numerical_features = df[use_col].dtypes == 'float'
categorical_features = ~numerical_features

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocess = make_column_transformer(
                                    (RobustScaler(), numerical_features),
                                    (OneHotEncoder(sparse=False), categorical_features)
)

rf = RandomForestRegressor(n_estimators=20)

clf = Pipeline(steps=[
                      ('preprocess', preprocess),
                      ('model', rf)
])

clf_trans = TransformedTargetRegressor(regressor=clf,
                                        func=np.log1p,
                                        inverse_func=np.expm1)

# Work
clf_trans.fit(df[use_col], y)

# Fail
clf_trans.fit(df[use_col], y, sample_weight=df['sample_weight'])

Expected Results

Fitting with sample_weight

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-366d815659ba> in <module>()
----> 1 clf_trans.fit(df[use_col], y, sample_weight=df['sample_weight'])

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/compose/_target.py in fit(self, X, y, sample_weight)
    194             self.regressor_.fit(X, y_trans)
    195         else:
--> 196             self.regressor_.fit(X, y_trans, sample_weight=sample_weight)
    197 
    198         return self

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    263             This estimator
    264         """
--> 265         Xt, fit_params = self._fit(X, y, **fit_params)
    266         if self._final_estimator is not None:
    267             self._final_estimator.fit(Xt, y, **fit_params)

~/anaconda3/envs/test_env/lib/python3.5/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
    200                                 if step is not None)
    201         for pname, pval in six.iteritems(fit_params):
--> 202             step, param = pname.split('__', 1)
    203             fit_params_steps[step][param] = pval
    204         Xt = X

ValueError: not enough values to unpack (expected 2, got 1)

Versions

import sklearn; sklearn.show_versions()
System:
   machine: Linux-4.4.0-127-generic-x86_64-with-debian-stretch-sid
executable: /home/gillesa/anaconda3/envs/test_env/bin/python
    python: 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56)  [GCC 7.3.0]

BLAS:
cblas_libs: cblas
  lib_dirs: 
    macros: 

Python deps:
   sklearn: 0.20.2
    pandas: 0.24.1
       pip: 19.0.1
setuptools: 40.2.0
     numpy: 1.16.1
    Cython: None
     scipy: 1.2.0

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:17 (15 by maintainers)

github_iconTop GitHub Comments

4reactions
jnothmancommented, Mar 2, 2019

You’re right. we don’t yet seem to properly support fit parameters in TransformedTargetRegressor. And perhaps we should…

1reaction
stefan-matcovicicommented, Aug 26, 2019

Cool, I’ll give it a try then

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.compose.TransformedTargetRegressor
TransformedTargetRegressor : Poisson regression and non-normal loss Poisson ... This regressor will automatically be cloned each time prior to fitting.
Read more >
How to use Custom Sklearn Classes and Pipelines
We start by defining a class that inherits from TransformerMixin which gives us the fit_transform method if we define the fit and transform ......
Read more >
Is it possible to add TransformedTargetRegressor into a scikit ...
No, because the scikit-learn original Pipeline does not change the y or the number of samples in X and y during the steps....
Read more >
Pipeline — Version 0.10.0 - Imbalanced-Learn
The final estimator only needs to implement fit. The transformers and samplers in the pipeline can be cached using memory argument. The purpose...
Read more >
Python Examples of sklearn.linear_model.Lasso
regr.fit, X, y) # fit with sample_weight with a regressor which does not support it sample_weight ... regr = TransformedTargetRegressor(regressor=Lasso(), ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found