numpy.seed hangs in child processes (minimal example provided)
See original GitHub issueI have a python script that concurrently processes numpy arrays and images in a random way. To have proper randomness inside the spawned processes I pass a random seed from the main process to the workers for them to be seeded.
When I use maxtasksperchild for the Pool, my script hangs after running Pool.map a number of times.
The following is a minimal snippet that reproduces the problem :
from multiprocessing import Pool
import numpy as np
def worker(n):
# Removing np.random.seed solves the issue
np.random.seed(1) #any seed value
return 1234 # trivial return value
# Removing maxtasksperchild solves the issue
ppool = Pool(20 , maxtasksperchild=5)
i=0
while True:
i += 1
# Removing np.random.randint(10) or taking it out of the loop solves the issue
rand = np.random.randint(10)
l = [3] # trivial input to ppool.map
result = ppool.map(worker, l)
print i,result[0]
This is the output :
1 1234
2 1234
3 1234
.
.
.
99 1234
100 1234 # at this point workers should've reached maxtasksperchild tasks
101 1234
102 1234
103 1234
104 1234
105 1234
106 1234
107 1234
108 1234
109 1234
110 1234
[hangs here indefinitely]
This snippet hangs on the following platforms : OS : Linux(Ubuntu) and OSX Python version : 2.7.10 Numpy versions : 1.11.0, 1.12.0, 1.13.0
If I replace np.random.randint(10)
with np.random.random()
then it works fine with Numpy 1.11 but not with Numpy 1.12 or 1.13.
Issue Analytics
- State:
- Created 6 years ago
- Comments:22 (20 by maintainers)
Top Results From Across the Web
python - Why this small snippet hangs using multiprocessing ...
To have proper randomness inside the spawned processes I pass a random seed from the main process to the workers for them to...
Read more >Parallel Random Number Generation — NumPy v1.24 Manual
There are four main strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed).
Read more >Numpy Random Seed, Explained - Sharp Sight
NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator. It provides an essential input ......
Read more >2.6. Supported NumPy features - Numba
One objective of Numba is having a seamless integration with NumPy. NumPy arrays provide an efficient storage method for homogeneous sets of data....
Read more >[P] Using PyTorch + NumPy? A bug that plagues ... - Reddit
Using NumPy's random number generator with multi-process data loading ... that you can seed separately and pass around to child processes.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This also hangs without Numpy, if you replace all np.random calls with
with lock: pass
with a global threading.Lock defined on top. Interestingly, it doesn’t hang on Python 3.Oh.
multiprocessing.Pool
spawns a background thread to manage workers: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L170-L173It loops in the background calling
_maintain_pool
: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L366If a worker exits, for example due to a
maxtasksperchild
limit, then_maintain_pool
calls_repopulate_pool
: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L240And then
_repopulate_pool
forks some new workers, still in this background thread: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L224So what’s happening is that eventually you get unlucky, and at the same moment that your main thread is calling some
np.random
function and holding the lock,multiprocessing
decides to fork a child, which starts out with thenp.random
lock already held but the thread that was holding it is gone. Then the child tries to call intonp.random
, which requires taking the lock, and so the child deadlocks.The simple workaround here is to not use
fork
withmultiprocessing
. If you use thespawn
orforkserver
start methods then this should go away.For a proper fix… ughhh. I guess we… need to register a
pthread_atfork
pre-fork handler that takes thenp.random
lock before fork and then releases it afterwards? And really I guess we need to do this for every lock in numpy, which requires something like keeping a weakset of everyRandomState
object, and_FFTCache
also appears to have a lock…(On the plus side, this would also give us an opportunity to reinitialize the global random state in the child, which we really should be doing in cases where the user hasn’t explicitly seeded it.)