Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`cudaErrorLaunchTimeout` error in Windows

See original GitHub issue

The following code works perfect:

import numpy as np
import cupy as cp

mp = cp.get_default_memory_pool()

n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

max_iter = 5
for i in range(max_iter):
    A_gpu = cp.asarray(A_cpu)
    B_gpu = cp.asarray(B_cpu)
    C_gpu = cp.zeros((n, n), dtype=np.float32)
    #cp.dot(A_gpu, B_gpu, out=C_gpu)
    print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))

Output is:

iter 0/9, memory: 1.61 G
iter 1/9, memory: 1.61 G
iter 2/9, memory: 1.61 G
iter 3/9, memory: 1.61 G
iter 4/9, memory: 1.61 G

However, code with cp.dot() produces exception ‘cupy.cuda.runtime.hostAlloc’:

import numpy as np
import cupy as cp

mp = cp.get_default_memory_pool()

n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

max_iter = 5
for i in range(max_iter):
   A_gpu = cp.asarray(A_cpu)
   B_gpu = cp.asarray(B_cpu)
   C_gpu = cp.zeros((n, n), dtype=np.float32)
   cp.dot(A_gpu, B_gpu, out=C_gpu)
   print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))

Output is:

iter 0/4, memory: 1.61 G
iter 1/4, memory: 1.61 G
iter 2/4, memory: 1.61 G
iter 3/4, memory: 1.61 G
Traceback (most recent call last):
  File "D:/WST/projects/nuance/src-py/nuance/playground.py", line 13, in <module>
    B_gpu = cp.asarray(B_cpu)
  File "C:\Dev\Python\python-3.6\Lib\site-packages\cupy\creation\from_data.py", line 60, in asarray
    return core.array(a, dtype, False)
  File "cupy\core\core.pyx", line 2117, in cupy.core.core.array
  File "cupy\core\core.pyx", line 2157, in cupy.core.core.array
  File "cupy\cuda\pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
  File "cupy\cuda\pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
  File "cupy\cuda\pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
  File "cupy\cuda\pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
  File "cupy\cuda\runtime.pyx", line 229, in cupy.cuda.runtime.hostAlloc
  File "cupy\cuda\runtime.pyx", line 135, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorLaunchTimeout: the launch timed out and was terminated

My environment is:

Windows 10
python 3.6.5
cupy-cuda92 5.0.0b3

Issue Analytics

State:
Created 5 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

alkalinincommented, Aug 20, 2018

My latest statistics with n = 10000. With C_gpu[0, 0].get() line I have 500 successful iterations

...
iter 494/499, memory: 1.12 G
iter 495/499, memory: 1.12 G
iter 496/499, memory: 1.12 G
iter 497/499, memory: 1.12 G
iter 498/499, memory: 1.12 G
iter 499/499, memory: 1.12 G

Process finished with exit code 0

For my practical tasks it is really works, for me problem is SOLVED!!!

0reactions

w-mcommented, Aug 20, 2018

I still suspect this is a watchdog issue

Ah indeed, I missed that the error in the original report is a cudaErrorLaunchTimeout. Tried it locally and for some n and max_iter, my machine actually ran out of host memory, instead of throwing the error. I guess they’re different effects, but have somewhat the same underlying cause (too many async unfinished operations).