Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility issue: `ray.init()` messes up `random.seed(SEED)`

See original GitHub issue

What is the problem?

Simply running ray.init() screws up seeding performed prior to that when using random standard library. Interestingly enough, this does not happen with np.random.

ray: 0.8.6 python: 3.8.3 OS: macOS 10.15.4

Reproduction

Code:

import random
import ray

SEED = 42

random.seed(SEED)
rand1 = random.random()

random.seed(SEED)

ray.init()

rand2 = random.random()

print("rand1:", rand1)
print("rand2:", rand2)

Output:

# ray initialization output...
rand1: 0.6394267984578837
rand2: 0.025010755222666936

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

dburov190commented, Aug 17, 2020

Thanks for quick responses, guys!

A quick google search tells me that it’s impossible to save and restore seeds (or a bad idea), and even though you can save and restore the state (https://stackoverflow.com/a/32172816), it seems the “most correct” way is to have your own random-number generator that does not affect the global state of random.random, numpy.random etc.

So, I guess, something like

import random

def new_port():
    custom_rng = random.Random()
    return custom_rng.randint(10000, 65535)

SEED = 42

random.seed(SEED)
rand1 = random.random()

random.seed(SEED)

port_number = new_port()

rand2 = random.random()

print("rand1:", rand1)
print("rand2:", rand2)
print("port_number:", port_number)

with the output:

rand1: 0.6394267984578837
rand2: 0.6394267984578837
port_number: 39199 # this will be different on each execution

This would completely solve the issue with seeds in the main process (that calls ray.init); I’m not sure what’s gonna happen within the worker processes… but that, to me, seems less of an issue since when you wrote your serial code you were not relying on workers anyway (and the parallelizable bits of code are most likely deterministic; if not, like in MCMC, well, it’s your job to explicitly select seeds that you want for each worker).

2reactions

dburov190commented, Aug 16, 2020

I would assume so… this is at least a very non-trivial behavior, and should be warned about?

Edit: the issue is that a typical code development cycle is: (1) write serial code (2) make sure everything is reproducible, and the results are as expected (3) try to make faster by parallelization if needed

So, in (3) after you use some framework to speed up the code, you expect the result to be the same, and this is an unpleasant surprise… (:

Top Results From Across the Web

Reproducible training - setting seeds for all workers ... - Ray

Hi! How can I best set the seed of my environments while training an RL agent? I found the following answer on stack...

How do I make ray.tune.run reproducible? - Stack Overflow

(This answer focuses on class API and ray version 0.8.7. ... Every search algorithm supports random seed, although interface to it may vary....

Stop Using NumPy's Global Random Seed - Built In

Setting the random seed means that your work is reproducible to others who use your code. But now when you look at the...

Random Seeds and Reproducibility - Towards Data Science

This means the algorithm is great for addressing reproducibility issues but totally unsuitable for cryptographic purposes. The number generator has an internal ...

2022Python random seed-智慧型手機整理開箱評比，精選在Youtube ...

Python Language Tutorial => Reproducible random numbers ... `ray.init()` messes up `random.seed(SEED)` · Issue #10145.