Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cupy garbage collection is 100-time slower than GPU computing

See original GitHub issue

I found an interesting phenomenon about cupy. If python’s garbage collection is on, the computation elapsed time is 100 longer than when it is off.

This may be a known issue, but it is just enough to surprise me that [a: cp.ndarray * 3.0 for i in range(2)] is 100-time slower than [a: cp.ndarray * 3.0 for i in range(1)].

(It should be noted that in %timeit loop, garbage collection is forced to off, so the loop made by %timeit does not slow down the computation.)

$ pip freeze | grep cupy
cupy==5.0.0b1

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cupy as cp

In [2]: a = cp.arange(8*16*100*100*100, dtype=cp.float32).reshape((8, 16, 100, 100, 100))

In [3]: %timeit -n100 -r10 c = [a * 2.0 for i in range(1)]
The slowest run took 5.26 times longer than the fastest. This could mean that an intermediate result is being cached.
110 us +- 89.9 us per loop (mean +- std. dev. of 10 runs, 100 loops each)

In [4]: %timeit -n100 -r10 c = [a * 2.0 for i in range(2)]
The slowest run took 40.09 times longer than the fastest. This could mean that an intermediate result is being cached.
2.66 ms +- 2.53 ms per loop (mean +- std. dev. of 10 runs, 100 loops each)

Issue Analytics

State:
Created 5 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

3reactions

fiarabbitcommented, Jul 18, 2018

@brandondube I totally understood why the queue was stacked by the 1024~ kernels. Thank you very much for your kind reply and thoughtful explanation.

@kmaehashi As I understood why my code is so slow, so this issue can be closed. (I’ll close this issue.)

1reaction

kmaehashicommented, Jun 4, 2018

@kmaehashi I do not mean to take your time on this issue, but cupy 5b compiles fine with MSVC++ 14 build tools and CUDA 9.2. A note about this may be added to cupy, or cuda 9.2 added to the build matrix, etc.

Thanks for the heads-up! We’re going to release Windows wheels for CUDA 9.2 in the next release.