question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cupy garbage collection is 100-time slower than GPU computing

See original GitHub issue

I found an interesting phenomenon about cupy. If python’s garbage collection is on, the computation elapsed time is 100 longer than when it is off.

This may be a known issue, but it is just enough to surprise me that [a: cp.ndarray * 3.0 for i in range(2)] is 100-time slower than [a: cp.ndarray * 3.0 for i in range(1)].

(It should be noted that in %timeit loop, garbage collection is forced to off, so the loop made by %timeit does not slow down the computation.)

$ pip freeze | grep cupy
cupy==5.0.0b1
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import cupy as cp

In [2]: a = cp.arange(8*16*100*100*100, dtype=cp.float32).reshape((8, 16, 100, 100, 100))

In [3]: %timeit -n100 -r10 c = [a * 2.0 for i in range(1)]
The slowest run took 5.26 times longer than the fastest. This could mean that an intermediate result is being cached.
110 us +- 89.9 us per loop (mean +- std. dev. of 10 runs, 100 loops each)

In [4]: %timeit -n100 -r10 c = [a * 2.0 for i in range(2)]
The slowest run took 40.09 times longer than the fastest. This could mean that an intermediate result is being cached.
2.66 ms +- 2.53 ms per loop (mean +- std. dev. of 10 runs, 100 loops each)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
fiarabbitcommented, Jul 18, 2018

@brandondube I totally understood why the queue was stacked by the 1024~ kernels. Thank you very much for your kind reply and thoughtful explanation.

@kmaehashi As I understood why my code is so slow, so this issue can be closed. (I’ll close this issue.)

1reaction
kmaehashicommented, Jun 4, 2018

@kmaehashi I do not mean to take your time on this issue, but cupy 5b compiles fine with MSVC++ 14 build tools and CUDA 9.2. A note about this may be added to cupy, or cuda 9.2 added to the build matrix, etc.

Thanks for the heads-up! We’re going to release Windows wheels for CUDA 9.2 in the next release.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Intermittent OutOfMemoryError in Cupy - Stack Overflow
I'm limited by memory and keep loading data to the GPU using ... print('GCed Objects:', gc.collect()) after cupy_array = None to explicitly ...
Read more >
Why modern software is slow | Hacker News
Cache misses, garbage collection, streaming, object pooling, ... directly into the GPU instead of first go to the CPU+RAM and then the GPU....
Read more >
Is Python really 'too slow'? - Reddit
in pure (C-)Python it is very slow - often a factor of 100 slower than in fast compiled languages. An issue with Cython...
Read more >
If the laptop is slow, how do I know if it's because of RAM or ...
If you consistently see CPU near 100% and RAM, storage, and GPU are well below that mark when you feel it is slowest,...
Read more >
Chapter 6. GPU Programming with Accelerate - O'Reilly
Modern graphics processing units (GPUs) usually have something on the order of 10 to 100 times more raw compute power than the general-purpose...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found