Get rid of the `random_state` argument in `get_pipeline`
See original GitHub issueCurrently, the best_pipeline
property in AutoMLSearch
will always return a pipeline with random_state=0
because that’s the default value for random_state
in the get_pipeline
method.
I think this is confusing because if I create an AutoMLSearch
object with random_state=5
, for example, I expect all of the pipelines created and returned by AutoMLSearch
to have random_state=5
.
I think we should remove the random_state
argument from get_pipeline
and instead make sure that the random state is set to whatever it was when the pipeline was fit.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
What does the random_state parameter do in sklearn's ...
It is a parameter that enables you to get consistent results. If you set it to 1, every time you run the code...
Read more >10. Common pitfalls and recommended practices - Scikit-learn
For reproducible results across executions, remove any use of random_state=None . 10.3.1. Using None or RandomState instances, and repeated calls to fit ...
Read more >Manipulating machine learning results with random state
Training data that I will be using: Using grid search to find the optimal xgboost hyperparameters, I got the best parameters for the...
Read more >Is random state a parameter to tune? - Cross Validated
So the question is simple, should I take random state as a hyperparameter? Why is that? If my model outperforms others with different...
Read more >EvalML Documentation
EvalML has many options to configure the pipeline search. At the minimum, we need to define an objective function.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Through discussion with @freddyaboulton, we decided it’s fine to have the pipeline
random_state
be anp.RandomState
object, but we want to make sure it’s seeded with therandom_state
int that a user passes in, OR with thenp.RandomState
object that a user passes in, depending on the input.This means that if a user passes in
random_state=4
toAutoMLSearch
, we create the pipelinerandom_state
with ‘4’ as the seed, and if a user passes inrandom_state=np.RandomState()
, we pass that to create the pipeline instead.@freddyaboulton I think the largest issue is that in our pipeline_base, we set the
random_state
to be an np.RandomState object, as seen hereIf we remove the
np.RandomState
object, then we can preserve the random state that the user could pass in, but we would also need to change the defaultrandom_state
behavior for component_base as well. This issue does seem to be tied with a discussion about support fornp.RandomState
. What do you think?