Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipeline in Pipeline seems to not work well with setting of parameters using `.set_params`

See original GitHub issue

Description

Using Pipeline in Pipeline in GridSearchCV fails sometimes at random. Use a snippet of code below to reproduce (fails ~50% of the time).

Steps/Code to Reproduce

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import Pipeline

X, y = load_diabetes(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)

gscv = GridSearchCV(
    estimator=Pipeline([ # pipeline in a pipeline
        ('a', Pipeline([
            ('b', DummyRegressor())
        ]))
    ]),
    param_grid={
        'a__b__alpha':[0.1, 0.001],
        'a__b':[Lasso()],
    }
)

gscv.fit(X_train, y_train)
print(gscv.score(X_test, y_test))

Expected Results

The code should work without exceptions.

Actual Results

Sometimes I get an error of the form

...
File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/pipeline.py", line 144, in set_params
    self._set_params('steps', **kwargs)
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 49, in _set_params
    super(_BaseComposition, self).set_params(**params)
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/base.py", line 276, in set_params
    sub_object.set_params(**{sub_name: value})
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/base.py", line 283, in set_params
    (key, self.__class__.__name__))
ValueError: Invalid parameter alpha for estimator DummyRegressor. Check the list of available parameters with `estimator.get_params().keys()`.

Reason for the issue

It appears that order in which parameters are set is random. Because of this, sometimes the values of a__b__alpha is set before the step a__b is set as such. See the code below.

Further code to reproduce

This raises same exception:

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import Pipeline

X, y = load_diabetes(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)

model = Pipeline([ # pipeline in a pipeline
    ('a', Pipeline([
        ('b', DummyRegressor())
    ]))
])

model.set_params(**{
    'a__b':Lasso(),
    'a__b__alpha':[0.01],
})

model.fit(X_train, y_train)

Versions

Linux-4.10.0-37-generic-x86_64-with-Ubuntu-16.04-xenial Python 3.5.2 (default, Aug 18 2017, 17:48:00) [GCC 5.4.0 20160609] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.0

Possible solution?

Maybe it would help to set parameters in order from shortest parameter name string to longest one. But maybe also looking more into Pipeline is necessary.

Should one not use Pipeline in Pipeline? But could the issue translate also to some complex estimators, eg Pipeline in FeatureUnion in Pipeline?

P.S. Thanks for the awesome library.

Issue Analytics

State:
Created 6 years ago
Comments:11 (8 by maintainers)

Top GitHub Comments

2reactions

amuellercommented, Oct 17, 2017

Lol I was trying to reproduce and couldn’t, and I think I know why. I’m using Python3.6 where all dicts are ordered. I think we need to make the iteration ordered in BaseEstimator.set_params, that should fix it.

1reaction

jnothmancommented, Jan 2, 2019

@shafaypro, RandomizedSearchCV does not currently support the kinds of conditional parameter spaces that searchgrid facilitates for GridSearchCV.