question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get rid of the `random_state` argument in `get_pipeline`

See original GitHub issue

Currently, the best_pipeline property in AutoMLSearch will always return a pipeline with random_state=0 because that’s the default value for random_state in the get_pipeline method.

I think this is confusing because if I create an AutoMLSearch object with random_state=5, for example, I expect all of the pipelines created and returned by AutoMLSearch to have random_state=5.

I think we should remove the random_state argument from get_pipeline and instead make sure that the random state is set to whatever it was when the pipeline was fit.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
bchen1116commented, Jan 21, 2021

Through discussion with @freddyaboulton, we decided it’s fine to have the pipeline random_state be a np.RandomState object, but we want to make sure it’s seeded with the random_state int that a user passes in, OR with the np.RandomState object that a user passes in, depending on the input.

This means that if a user passes in random_state=4 to AutoMLSearch, we create the pipeline random_state with ‘4’ as the seed, and if a user passes in random_state=np.RandomState(), we pass that to create the pipeline instead.

0reactions
bchen1116commented, Jan 21, 2021

@freddyaboulton I think the largest issue is that in our pipeline_base, we set the random_state to be an np.RandomState object, as seen here image

If we remove the np.RandomState object, then we can preserve the random state that the user could pass in, but we would also need to change the default random_state behavior for component_base as well. This issue does seem to be tied with a discussion about support for np.RandomState. What do you think?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What does the random_state parameter do in sklearn's ...
It is a parameter that enables you to get consistent results. If you set it to 1, every time you run the code...
Read more >
10. Common pitfalls and recommended practices - Scikit-learn
For reproducible results across executions, remove any use of random_state=None . 10.3.1. Using None or RandomState instances, and repeated calls to fit ...
Read more >
Manipulating machine learning results with random state
Training data that I will be using: Using grid search to find the optimal xgboost hyperparameters, I got the best parameters for the...
Read more >
Is random state a parameter to tune? - Cross Validated
So the question is simple, should I take random state as a hyperparameter? Why is that? If my model outperforms others with different...
Read more >
EvalML Documentation
EvalML has many options to configure the pipeline search. At the minimum, we need to define an objective function.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found