question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipeline in Pipeline seems to not work well with setting of parameters using `.set_params`

See original GitHub issue

Description

Using Pipeline in Pipeline in GridSearchCV fails sometimes at random. Use a snippet of code below to reproduce (fails ~50% of the time).

Steps/Code to Reproduce

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import Pipeline

X, y = load_diabetes(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)

gscv = GridSearchCV(
    estimator=Pipeline([ # pipeline in a pipeline
        ('a', Pipeline([
            ('b', DummyRegressor())
        ]))
    ]),
    param_grid={
        'a__b__alpha':[0.1, 0.001],
        'a__b':[Lasso()],
    }
)

gscv.fit(X_train, y_train)
print(gscv.score(X_test, y_test))

Expected Results

The code should work without exceptions.

Actual Results

Sometimes I get an error of the form

...
File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/pipeline.py", line 144, in set_params
    self._set_params('steps', **kwargs)
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 49, in _set_params
    super(_BaseComposition, self).set_params(**params)
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/base.py", line 276, in set_params
    sub_object.set_params(**{sub_name: value})
  File "/home/iaroslav/.local/lib/python3.5/site-packages/sklearn/base.py", line 283, in set_params
    (key, self.__class__.__name__))
ValueError: Invalid parameter alpha for estimator DummyRegressor. Check the list of available parameters with `estimator.get_params().keys()`.

Reason for the issue

It appears that order in which parameters are set is random. Because of this, sometimes the values of a__b__alpha is set before the step a__b is set as such. See the code below.

Further code to reproduce

This raises same exception:

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import Pipeline

X, y = load_diabetes(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)

model = Pipeline([ # pipeline in a pipeline
    ('a', Pipeline([
        ('b', DummyRegressor())
    ]))
])

model.set_params(**{
    'a__b':Lasso(),
    'a__b__alpha':[0.01],
})

model.fit(X_train, y_train)

Versions

Linux-4.10.0-37-generic-x86_64-with-Ubuntu-16.04-xenial Python 3.5.2 (default, Aug 18 2017, 17:48:00) [GCC 5.4.0 20160609] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.0

Possible solution?

Maybe it would help to set parameters in order from shortest parameter name string to longest one. But maybe also looking more into Pipeline is necessary.

Should one not use Pipeline in Pipeline? But could the issue translate also to some complex estimators, eg Pipeline in FeatureUnion in Pipeline?

P.S. Thanks for the awesome library.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
amuellercommented, Oct 17, 2017

Lol I was trying to reproduce and couldn’t, and I think I know why. I’m using Python3.6 where all dicts are ordered. I think we need to make the iteration ordered in BaseEstimator.set_params, that should fix it.

1reaction
jnothmancommented, Jan 2, 2019

@shafaypro, RandomizedSearchCV does not currently support the kinds of conditional parameter spaces that searchgrid facilitates for GridSearchCV.

Read more comments on GitHub >

github_iconTop Results From Across the Web

setParams works when hardcoded but not with variables
PARAMS THE APEX FUNCTION DOES NOT WORK, BUT IF I HARD CODE THE SET.PARAMS PARAMETERS IT DOES WORK. var uploadedFileNames = component.get("v.
Read more >
action.SetParams not passing parameter through to Apex ...
I'm currently modifying a Lightning Component that someone else created. As such, I'm trying to get the hang of how to work with...
Read more >
Use pipeline parameters in the designer to build versatile ...
Create a pipeline parameter in the settings panel, and bind it to a component. Promote a component parameter to a pipeline parameter. Promote...
Read more >
Replace parameters.xml or setParams.xml with Azure DevOps ...
My problem now is that I get only the raw value and not the resolved value from configured value in the pipeline. parameters.xml...
Read more >
sklearn.base.BaseEstimator — scikit-learn 1.2.0 documentation
The method works on simple estimators as well as on nested objects (such as Pipeline ). The latter have parameters of the form...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found