Reproducibility issue: `ray.init()` messes up `random.seed(SEED)`
See original GitHub issueWhat is the problem?
Simply running ray.init()
screws up seeding performed prior to that when using random
standard library. Interestingly enough, this does not happen with np.random
.
ray: 0.8.6 python: 3.8.3 OS: macOS 10.15.4
Reproduction
Code:
import random
import ray
SEED = 42
random.seed(SEED)
rand1 = random.random()
random.seed(SEED)
ray.init()
rand2 = random.random()
print("rand1:", rand1)
print("rand2:", rand2)
Output:
# ray initialization output...
rand1: 0.6394267984578837
rand2: 0.025010755222666936
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Reproducible training - setting seeds for all workers ... - Ray
Hi! How can I best set the seed of my environments while training an RL agent? I found the following answer on stack...
Read more >How do I make ray.tune.run reproducible? - Stack Overflow
(This answer focuses on class API and ray version 0.8.7. ... Every search algorithm supports random seed, although interface to it may vary....
Read more >Stop Using NumPy's Global Random Seed - Built In
Setting the random seed means that your work is reproducible to others who use your code. But now when you look at the...
Read more >Random Seeds and Reproducibility - Towards Data Science
This means the algorithm is great for addressing reproducibility issues but totally unsuitable for cryptographic purposes. The number generator has an internal ...
Read more >2022Python random seed-智慧型手機整理開箱評比,精選在Youtube ...
Python Language Tutorial => Reproducible random numbers ... `ray.init()` messes up `random.seed(SEED)` · Issue #10145.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for quick responses, guys!
A quick google search tells me that it’s impossible to save and restore seeds (or a bad idea), and even though you can save and restore the state (https://stackoverflow.com/a/32172816), it seems the “most correct” way is to have your own random-number generator that does not affect the global state of
random.random
,numpy.random
etc.So, I guess, something like
with the output:
This would completely solve the issue with seeds in the main process (that calls
ray.init
); I’m not sure what’s gonna happen within the worker processes… but that, to me, seems less of an issue since when you wrote your serial code you were not relying on workers anyway (and the parallelizable bits of code are most likely deterministic; if not, like in MCMC, well, it’s your job to explicitly select seeds that you want for each worker).I would assume so… this is at least a very non-trivial behavior, and should be warned about?
Edit: the issue is that a typical code development cycle is: (1) write serial code (2) make sure everything is reproducible, and the results are as expected (3) try to make faster by parallelization if needed
So, in (3) after you use some framework to speed up the code, you expect the result to be the same, and this is an unpleasant surprise… (: