RandomState_ctor accesses /dev/urandom, a significant bottleneck when copying RandomStates
See original GitHub issueThe RandomState_ctor function in numpy.random.init makes an call to construct a new RandomState object without an explicit seed. The unseeded call results in an access to /dev/urandom which is wildly expensive. Given that the purpose of this function to add in pickling, so that the internal state will be updated to reflect the state of the pickled RandomState, this expensive access actually don’t do anything. Because the state of the newly created RandomState is irrelevant (it’ll be updated by pickling), its safe to use the fast cached seed value of 0. An example of updating the function is below as well as a nosetest backed unit test that verifies behavior. In order to incorporate into master branch the code in init.py in random simply needs to have seed=0 added to it.
def __Fast_RandomState_ctor():
"""Patch the __RandomState_ctor initialization routine for performance
numpy.random.__init__.__RandomState_ctor calls RandomState() directly, which
interacts with the filesystem and kernel via /dev/urandom, causing a major performance
bottleneck running many short parallel jobs.
This function overrides that code and uses the seed=0 version to avoid the bad
system call interaction. As far as I can tell this is totally kosher, as this
routine appears to be used for pickling only, and produces a random state object
that will be initialized in some other way later.
"""
return RandomState(seed=0)
numpy.random.__RandomState_ctor = __Fast_RandomState_ctor
Unit test to verify behavior works:
def test_copy_random_state():
"""Test that we can deep copy a random state properly
Checks that our constructor optimization is safe
"""
def state_eq(ran1, ran2):
"""Need to know too much about structure of RandomState.get_state()"""
name1, state1 = ran1.get_state()[0:2]
name2, state2 = ran2.get_state()[0:2]
nose.tools.eq_(name1, name2)
nose.tools.ok_((state1 == state2).all())
def checkme(seed):
"""Ensure that RandomState created with seed can be duplicated properly"""
if seed == 'no_arg':
state = np.random.RandomState()
else:
state = np.random.RandomState(seed=seed)
dup1 = copy.deepcopy(state) # dup1 is the same as state
state_eq(state, dup1)
state_rands = (state.rand(), state.rand(), state.rand())
dup1_rand1 = dup1.rand() # dup1 is two behind state
dup2 = copy.deepcopy(dup1) # dup2 is == dup1, and two behind state
state_eq(dup1, dup2)
dup1_rands = (dup1_rand1, dup1.rand(), dup1.rand())
dup2_rands = (dup1_rand1, dup2.rand(), dup2.rand())
nose.tools.eq_(state_rands, dup1_rands, 'Original and immediate duplicate produce random numbers')
nose.tools.eq_(state_rands, dup2_rands, 'Dup2 should be dup1 + 1 rand')
# ensure that we're really not synchronized
state.rand()
for _ in range(100):
nose.tools.assert_not_equal(state.rand(), dup1.rand())
nose.tools.assert_not_equal(state.rand(), dup2.rand())
nose.tools.eq_(np.random.__RandomState_ctor, __Fast_RandomState_ctor)
yield checkme, 0
yield checkme, None
yield checkme, 'no_arg'
yield checkme, [1, 2, 3]
Issue Analytics
- State:
- Created 9 years ago
- Comments:14 (11 by maintainers)
Top GitHub Comments
It is on virtualized machines on AWS, especially when running with multiple cores via multiprocessing. Each access is much more expensive, and you have many processes contending for the same system resource. I can confirm it’s much less significant of an effect directly on my macbook but on a c3.8xlarge on AWS we observe 90% of our CPU time is spent blocking on /dev/urandom syscall requests.
On Sat, May 31, 2014 at 12:13 PM, Julian Taylor notifications@github.com wrote:
Mark A. DePristo mark@depristo.com
Here you go:
https://github.com/numpy/numpy/pull/4768
Mark
On Mon, Jun 2, 2014 at 1:26 PM, Charles Harris notifications@github.com wrote:
Mark A. DePristo mark@depristo.com