question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use modifiable global random state in tests

See original GitHub issue

As mentioned by @jnothman in https://github.com/scikit-learn/scikit-learn/issues/13846#issuecomment-494175027

Relatedly, I proposed having a random_seed fixture that was globally set to different values on different testing runs. One benefit would be that we could easily distinguish those tests that are invariant under changing random seed from those that are brittle.

I think it would be a good idea. For instance, we could,

  • create a global auto-use pytest fixture in scikit-learn/conftest.py,
    @pytest.fixture(scope="session")
    def pytest_rng():
        random_seed = os.environ.get('SKLEARN_TEST_RNG', 42)
        return np.random.RandomState(random_seed)
    
  • modify tests to use it, e.g.
    -  def test_something():
    +  def test_something(pytest_rng):
    -     rng = np.random.RandomState(0)
    -     est = Estimator(random_state=rng)
    +     est = Estimator(random_state=pytest_rng)
    

One issue is that global auto-use fixtures are a bit magical, but I’m hoping that naming it as pytest_rng it would be explicit enough.

Edit: updated to avoid using an auto-use fixture.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, May 21, 2019

The idea would be to specifically identify tests that are robust to changes of random state.\

0reactions
jeremiedbbcommented, Mar 12, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

Do not modify global random state · Issue #39716 - GitHub
I would like to propose that instead all functions which need a random source accept a local, non-global, random_seed / random_state argument to ......
Read more >
python - What is "random-state" in sklearn.model_selection ...
Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits ...
Read more >
Why ML model produces different results despite ...
Given that sklearn does not have its own global random seed but uses the numpy random seed we can set it globally with...
Read more >
Why do we set a random state in machine learning models?
The random state hyperparameter is used to control the randomness involved in machine learning models. We can use cross-validation to mitigate the effect...
Read more >
Legacy Random Generation — NumPy v1.25.dev0 Manual
RandomState adds additional information to the state which is required when using Box-Muller normals since these are produced in pairs.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found