Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cupy.random.randint is slow

See original GitHub issue

Hi all. It seems, numpy.random.randint is notably faster than cupy.random.randint. Is this expected?

I’ve also attained a dramatic speedup, sampling uniform floats with cupy and rounding them to int.

import cupy as cp
import numpy as np
import time

def runWithTime(f, iters):
    start = time.time()
    for i in range(iters):
        f()
    print(time.time() - start)

size = 100000
high = 1000000
iters = 50000

runWithTime(lambda: cp.random.uniform(low=0, high=high, size=size).astype('int', copy=False), iters)
runWithTime(lambda: np.random.randint(0, high, size=size), iters)
runWithTime(lambda: cp.random.randint(0, high, size=size), iters)

prints

2.910490036010742
23.180258750915527
36.04059100151062

Is there a reason why we cannot rewrite cupy.random.randint, relying cupy.random.uniform?

Conditions

CuPy Version          : 6.0.0
CUDA Root             : /usr/local/cuda
CUDA Build Version    : 10000
CUDA Driver Version   : 10010
CUDA Runtime Version  : 10000
cuDNN Build Version   : 7301
cuDNN Version         : 7605
NCCL Build Version    : 1000
NCCL Runtime Version  : (unknown)
Linux 4.4.180-102-default 
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Tesla P100

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

toslunarcommented, Oct 13, 2020

The current algorithm aligns with numpy. However the algorithm, rejection sampling, is not very efficient with GPU. numpy: https://github.com/numpy/numpy/blob/a72b89c7c2e30f5df5cf27f68b6afd45361934fd/numpy/random/src/distributions/distributions.c#L1072-L1081 cupy: https://github.com/cupy/cupy/blob/9a8cdf80b212f31278d7168851b1dac0c6d75f06/cupy/random/_generator.py#L689-L710

0reactions

akopichcommented, Oct 13, 2020

@anaruse Could you please clarify a little further? Is your argument based on the fact, that the uniform distribution is modeled by float64-values of equal probability for intervals of equal exponent field, like [0.5, 1)? randint seems to be implemented only for int32 and it is possible to distribute 2^52 equiprobable objects into 2^32 bins almost uniformly. And in general (not on [0.5,1)), we have even more (float64-valued) objects, which are not equiprobable.

Also, I’ve noticed, cupy.random.uniform might return high.

import cupy as cp

high = int(2**32) - 1
low = int(2**32) - 3
size = 100000000
print(cp.max(cp.random.uniform(low=low, high=high, size=size)) == high)

prints True. Is this expected behavior?

Speaking of “almost”. How do you test the random generators cupy provides?