Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC design of random_state

See original GitHub issue

This is inspired by #14034 but we had several similar issues of the years. The current design of random_state is often hard to understand and confusing for users. I think we should rethink how random_state works. Maybe that’s a 1.0 issue, I’m not sure.

What I find most confusing is the behavior of passing a RandomState object to random_state, because that makes the object be statefull across calls to fit, so it violates our contract of fit being idempotent. Because fit (or possibly even predict?) consumes the random state object it’s mutated. I don’t see a real use-case for passing a random state object and think we might want to deprecate that. At least we should never store it imho.

There have been countless bugs because of this, and I think they are pretty easily avoidable.

Another question is the behavior of random_state=None which can also be confusing. Repeated calls to fit result in different models. Sometimes that’s good, sometimes that’s bad. There would be ways to change this, but I’m not sure if it’s a good idea. The bug in #14034 is also present when random_state=None, so just deprecating passing RandomState would not avoid these kinds of bugs in the future.

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:13 (13 by maintainers)

Top GitHub Comments

1reaction

thomasjpfancommented, Sep 13, 2019

Another option is to change how RandomState is handled, by copying the state in fit rather than mutating it.

This option can also be used to deal with random_state=None or our current situation. Although this would mean storing RandomState.get_state, which contains 624 unsigned ints.

1reaction

ogriselcommented, Sep 13, 2019

Maybe mutating __init__ for random state is OK then I don’t know. Do you remember the initial motivation for not allowing it?

Because we want to have __init__ and set_params to behave consistently to ease the assumption we can make about parametrize model and integration with hyperparam selection tools. But maybe we can special case random_state=None.

Top Results From Across the Web

Different results from random forest after fixing the random state

I set the random state already. Everytime I do cross validation, it gives me a new set of optimal parameters. This just doesn't...

draft-sheffer-dhc-initial-random-00 - IETF Datatracker

Different devices choose whether or not to save random state across reboots based on their particular design considerations. In short, saving state causes ......

Model Validation in Python from DataCamp

Use rfc as the random forest classification model. ... print ( 'The random state is: {}' . format (rfc.random_state)).

RFC 1750: Randomness Recommendations for Security

For the present, the lack of generally available facilities for generating such unpredictable numbers is an open wound in the design of cryptographic ......

Leveraging the i32 default to type parameters - language design

Take as example HashMap<K,V,S=RandomState> , where the third ... I use that in Pre-pre-RFC: syntactic sugar for `Default::default()` - #7 by ...