Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get rid of the `random_state` argument in `get_pipeline`

See original GitHub issue

Currently, the best_pipeline property in AutoMLSearch will always return a pipeline with random_state=0 because that’s the default value for random_state in the get_pipeline method.

I think this is confusing because if I create an AutoMLSearch object with random_state=5, for example, I expect all of the pipelines created and returned by AutoMLSearch to have random_state=5.

I think we should remove the random_state argument from get_pipeline and instead make sure that the random state is set to whatever it was when the pipeline was fit.

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

bchen1116commented, Jan 21, 2021

Through discussion with @freddyaboulton, we decided it’s fine to have the pipeline random_state be a np.RandomState object, but we want to make sure it’s seeded with the random_state int that a user passes in, OR with the np.RandomState object that a user passes in, depending on the input.

This means that if a user passes in random_state=4 to AutoMLSearch, we create the pipeline random_state with ‘4’ as the seed, and if a user passes in random_state=np.RandomState(), we pass that to create the pipeline instead.

0reactions

bchen1116commented, Jan 21, 2021

@freddyaboulton I think the largest issue is that in our pipeline_base, we set the random_state to be an np.RandomState object, as seen here

If we remove the np.RandomState object, then we can preserve the random state that the user could pass in, but we would also need to change the default random_state behavior for component_base as well. This issue does seem to be tied with a discussion about support for np.RandomState. What do you think?

Top Results From Across the Web

What does the random_state parameter do in sklearn's ...

It is a parameter that enables you to get consistent results. If you set it to 1, every time you run the code...

10. Common pitfalls and recommended practices - Scikit-learn

For reproducible results across executions, remove any use of random_state=None . 10.3.1. Using None or RandomState instances, and repeated calls to fit ...

Manipulating machine learning results with random state

Training data that I will be using: Using grid search to find the optimal xgboost hyperparameters, I got the best parameters for the...

Is random state a parameter to tune? - Cross Validated

So the question is simple, should I take random state as a hyperparameter? Why is that? If my model outperforms others with different...

EvalML Documentation

EvalML has many options to configure the pipeline search. At the minimum, we need to define an objective function.