question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_fit_regressor in stochastic_gradient.py does not use random state for call to make_dataset

See original GitHub issue

Describe the bug

See here:

    def _fit_regressor(self, X, y, alpha, C, loss, learning_rate,
                       sample_weight, max_iter):
        dataset, intercept_decay = make_dataset(X, y, sample_weight)

make_dataset does not get random seed passed, so it picks one using global random state. This makes it harder to reproduce same results between runs of the same code.

Similar code for fit_binary does pass it correctly.

Versions

I verified the problematic code is still there at 28ee486b44f8e7e6440f3439e7315ba1e6d35e43 commit.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mitarcommented, Mar 11, 2021

@PierreAttard Go for it.

0reactions
mitarcommented, Apr 6, 2022

So how can one then control randomness in shuffling?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Random state (Pseudo-random number) in Scikit learn
Regarding your second question, a pseudo-random number generator is a number generator that generates almost truly random numbers. Why they are ...
Read more >
Why do we set a random state in machine learning models?
The random state hyperparameter is used to control the randomness involved in machine learning models. We can use cross-validation to mitigate the effect...
Read more >
Random_state does not ensure reproducible splits in stratified ...
I checked the code and as you said, there is no call to the seed. Whereas the function check_random_state from sklearn is used,...
Read more >
random_state in Machine Learning | Data Science ... - Kaggle
Random_state is used to set the seed for the random generator so that we can ensure that the results that we get can...
Read more >
10. Common pitfalls and recommended practices - Scikit-learn
Test data should never be used to make choices about the model. ... Using None or RandomState instances, and repeated calls to fit...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found