question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC design of random_state

See original GitHub issue

This is inspired by #14034 but we had several similar issues of the years. The current design of random_state is often hard to understand and confusing for users. I think we should rethink how random_state works. Maybe that’s a 1.0 issue, I’m not sure.

What I find most confusing is the behavior of passing a RandomState object to random_state, because that makes the object be statefull across calls to fit, so it violates our contract of fit being idempotent. Because fit (or possibly even predict?) consumes the random state object it’s mutated. I don’t see a real use-case for passing a random state object and think we might want to deprecate that. At least we should never store it imho.

There have been countless bugs because of this, and I think they are pretty easily avoidable.

Another question is the behavior of random_state=None which can also be confusing. Repeated calls to fit result in different models. Sometimes that’s good, sometimes that’s bad. There would be ways to change this, but I’m not sure if it’s a good idea. The bug in #14034 is also present when random_state=None, so just deprecating passing RandomState would not avoid these kinds of bugs in the future.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:3
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
thomasjpfancommented, Sep 13, 2019

Another option is to change how RandomState is handled, by copying the state in fit rather than mutating it.

This option can also be used to deal with random_state=None or our current situation. Although this would mean storing RandomState.get_state, which contains 624 unsigned ints.

1reaction
ogriselcommented, Sep 13, 2019

Maybe mutating __init__ for random state is OK then I don’t know. Do you remember the initial motivation for not allowing it?

Because we want to have __init__ and set_params to behave consistently to ease the assumption we can make about parametrize model and integration with hyperparam selection tools. But maybe we can special case random_state=None.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Different results from random forest after fixing the random state
I set the random state already. Everytime I do cross validation, it gives me a new set of optimal parameters. This just doesn't...
Read more >
draft-sheffer-dhc-initial-random-00 - IETF Datatracker
Different devices choose whether or not to save random state across reboots based on their particular design considerations. In short, saving state causes ......
Read more >
Model Validation in Python from DataCamp
Use rfc as the random forest classification model. ... print ( 'The random state is: {}' . format (rfc.random_state)).
Read more >
RFC 1750: Randomness Recommendations for Security
For the present, the lack of generally available facilities for generating such unpredictable numbers is an open wound in the design of cryptographic ......
Read more >
Leveraging the i32 default to type parameters - language design
Take as example HashMap<K,V,S=RandomState> , where the third ... I use that in Pre-pre-RFC: syntactic sugar for `Default::default()` - #7 by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found