Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_fit_regressor in stochastic_gradient.py does not use random state for call to make_dataset

See original GitHub issue

Describe the bug

See here:

    def _fit_regressor(self, X, y, alpha, C, loss, learning_rate,
                       sample_weight, max_iter):
        dataset, intercept_decay = make_dataset(X, y, sample_weight)

make_dataset does not get random seed passed, so it picks one using global random state. This makes it harder to reproduce same results between runs of the same code.

Similar code for fit_binary does pass it correctly.

Versions

I verified the problematic code is still there at 28ee486b44f8e7e6440f3439e7315ba1e6d35e43 commit.

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

mitarcommented, Mar 11, 2021

@PierreAttard Go for it.

0reactions

mitarcommented, Apr 6, 2022

So how can one then control randomness in shuffling?

Top Results From Across the Web

Random state (Pseudo-random number) in Scikit learn

Regarding your second question, a pseudo-random number generator is a number generator that generates almost truly random numbers. Why they are ...

Why do we set a random state in machine learning models?

The random state hyperparameter is used to control the randomness involved in machine learning models. We can use cross-validation to mitigate the effect...

Random_state does not ensure reproducible splits in stratified ...

I checked the code and as you said, there is no call to the seed. Whereas the function check_random_state from sklearn is used,...

random_state in Machine Learning | Data Science ... - Kaggle

Random_state is used to set the seed for the random generator so that we can ensure that the results that we get can...

10. Common pitfalls and recommended practices - Scikit-learn

Test data should never be used to make choices about the model. ... Using None or RandomState instances, and repeated calls to fit...