cupy.random.randint is slow
See original GitHub issueHi all. It seems, numpy.random.randint
is notably faster than cupy.random.randint
. Is this expected?
I’ve also attained a dramatic speedup, sampling uniform floats with cupy
and rounding them to int
.
import cupy as cp
import numpy as np
import time
def runWithTime(f, iters):
start = time.time()
for i in range(iters):
f()
print(time.time() - start)
size = 100000
high = 1000000
iters = 50000
runWithTime(lambda: cp.random.uniform(low=0, high=high, size=size).astype('int', copy=False), iters)
runWithTime(lambda: np.random.randint(0, high, size=size), iters)
runWithTime(lambda: cp.random.randint(0, high, size=size), iters)
prints
2.910490036010742
23.180258750915527
36.04059100151062
Is there a reason why we cannot rewrite cupy.random.randint
, relying cupy.random.uniform
?
- Conditions
CuPy Version : 6.0.0
CUDA Root : /usr/local/cuda
CUDA Build Version : 10000
CUDA Driver Version : 10010
CUDA Runtime Version : 10000
cuDNN Build Version : 7301
cuDNN Version : 7605
NCCL Build Version : 1000
NCCL Runtime Version : (unknown)
Linux 4.4.180-102-default
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Tesla P100
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (4 by maintainers)
Top Results From Across the Web
cupy.random.randint — CuPy 11.4.0 documentation
Returns a scalar or an array of integer values over [low, high) . Each element of returned values are independently sampled from uniform...
Read more >Only GPU to CPU transfer with cupy is incredible slow
Event() with record and synchronize to avoid measuring any random times, but is still the same result, the GPU->CPU transfer is incredible slow....
Read more >Slow and fast methods for generating random integers in Python
Python's random.randint() function feels quite slow, in comparison to other randomness-generating functions. Since randint() is the ...
Read more >CuPy Documentation - Read the Docs
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in.
Read more >Random array generation : numba cuda slower than cupy?
Hi all, I am looking to optimize the random number generation in my Brownian dynamics simulation code. I quickly turned to GPU computing ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The current algorithm aligns with numpy. However the algorithm, rejection sampling, is not very efficient with GPU. numpy: https://github.com/numpy/numpy/blob/a72b89c7c2e30f5df5cf27f68b6afd45361934fd/numpy/random/src/distributions/distributions.c#L1072-L1081 cupy: https://github.com/cupy/cupy/blob/9a8cdf80b212f31278d7168851b1dac0c6d75f06/cupy/random/_generator.py#L689-L710
@anaruse Could you please clarify a little further? Is your argument based on the fact, that the uniform distribution is modeled by float64-values of equal probability for intervals of equal exponent field, like [0.5, 1)?
randint
seems to be implemented only for int32 and it is possible to distribute 2^52 equiprobable objects into 2^32 bins almost uniformly. And in general (not on [0.5,1)), we have even more (float64-valued) objects, which are not equiprobable.Also, I’ve noticed,
cupy.random.uniform
might returnhigh
.prints
True
. Is this expected behavior?Speaking of “almost”. How do you test the random generators
cupy
provides?