Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use modifiable global random state in tests

See original GitHub issue

As mentioned by @jnothman in https://github.com/scikit-learn/scikit-learn/issues/13846#issuecomment-494175027

Relatedly, I proposed having a random_seed fixture that was globally set to different values on different testing runs. One benefit would be that we could easily distinguish those tests that are invariant under changing random seed from those that are brittle.

I think it would be a good idea. For instance, we could,

create a global ~~auto-use~~ pytest fixture in scikit-learn/conftest.py,

@pytest.fixture(scope="session")
def pytest_rng():
    random_seed = os.environ.get('SKLEARN_TEST_RNG', 42)
    return np.random.RandomState(random_seed)

modify tests to use it, e.g.

-  def test_something():
+  def test_something(pytest_rng):
-     rng = np.random.RandomState(0)
-     est = Estimator(random_state=rng)
+     est = Estimator(random_state=pytest_rng)

One issue is that global auto-use fixtures are a bit magical, but I’m hoping that naming it as pytest_rng it would be explicit enough.

Edit: updated to avoid using an auto-use fixture.

Issue Analytics

State:
Created 4 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, May 21, 2019

The idea would be to specifically identify tests that are robust to changes of random state.\

0reactions

jeremiedbbcommented, Mar 12, 2022

In progress in https://github.com/scikit-learn/scikit-learn/pull/22749

Top Results From Across the Web

Do not modify global random state · Issue #39716 - GitHub

I would like to propose that instead all functions which need a random source accept a local, non-global, random_seed / random_state argument to ......

python - What is "random-state" in sklearn.model_selection ...

Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits ...

Why ML model produces different results despite ...

Given that sklearn does not have its own global random seed but uses the numpy random seed we can set it globally with...

Why do we set a random state in machine learning models?

The random state hyperparameter is used to control the randomness involved in machine learning models. We can use cross-validation to mitigate the effect...

Legacy Random Generation — NumPy v1.25.dev0 Manual

RandomState adds additional information to the state which is required when using Box-Muller normals since these are produced in pairs.