RFC design of random_state
See original GitHub issueThis is inspired by #14034 but we had several similar issues of the years.
The current design of random_state is often hard to understand and confusing for users.
I think we should rethink how random_state works. Maybe that’s a 1.0 issue, I’m not sure.
What I find most confusing is the behavior of passing a RandomState object to random_state, because that makes the object be statefull across calls to fit, so it violates our contract of fit being idempotent. Because fit (or possibly even predict?) consumes the random state object it’s mutated.
I don’t see a real use-case for passing a random state object and think we might want to deprecate that. At least we should never store it imho.
There have been countless bugs because of this, and I think they are pretty easily avoidable.
Another question is the behavior of random_state=None which can also be confusing. Repeated calls to fit result in different models. Sometimes that’s good, sometimes that’s bad. There would be ways to change this, but I’m not sure if it’s a good idea.
The bug in #14034 is also present when random_state=None, so just deprecating passing RandomState would not avoid these kinds of bugs in the future.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:13 (13 by maintainers)

Top Related StackOverflow Question
This option can also be used to deal with
random_state=Noneor our current situation. Although this would mean storingRandomState.get_state, which contains 624 unsigned ints.Because we want to have
__init__andset_paramsto behave consistently to ease the assumption we can make about parametrize model and integration with hyperparam selection tools. But maybe we can special caserandom_state=None.