question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`cudaErrorLaunchTimeout` error in Windows

See original GitHub issue

The following code works perfect:

import numpy as np
import cupy as cp

mp = cp.get_default_memory_pool()

n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

max_iter = 5
for i in range(max_iter):
    A_gpu = cp.asarray(A_cpu)
    B_gpu = cp.asarray(B_cpu)
    C_gpu = cp.zeros((n, n), dtype=np.float32)
    #cp.dot(A_gpu, B_gpu, out=C_gpu)
    print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))

Output is:

iter 0/9, memory: 1.61 G
iter 1/9, memory: 1.61 G
iter 2/9, memory: 1.61 G
iter 3/9, memory: 1.61 G
iter 4/9, memory: 1.61 G

However, code with cp.dot() produces exception ‘cupy.cuda.runtime.hostAlloc’:

import numpy as np
import cupy as cp

mp = cp.get_default_memory_pool()

n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

max_iter = 5
for i in range(max_iter):
   A_gpu = cp.asarray(A_cpu)
   B_gpu = cp.asarray(B_cpu)
   C_gpu = cp.zeros((n, n), dtype=np.float32)
   cp.dot(A_gpu, B_gpu, out=C_gpu)
   print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))

Output is:

iter 0/4, memory: 1.61 G
iter 1/4, memory: 1.61 G
iter 2/4, memory: 1.61 G
iter 3/4, memory: 1.61 G
Traceback (most recent call last):
  File "D:/WST/projects/nuance/src-py/nuance/playground.py", line 13, in <module>
    B_gpu = cp.asarray(B_cpu)
  File "C:\Dev\Python\python-3.6\Lib\site-packages\cupy\creation\from_data.py", line 60, in asarray
    return core.array(a, dtype, False)
  File "cupy\core\core.pyx", line 2117, in cupy.core.core.array
  File "cupy\core\core.pyx", line 2157, in cupy.core.core.array
  File "cupy\cuda\pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
  File "cupy\cuda\pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy\cuda\pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
  File "cupy\cuda\pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
  File "cupy\cuda\pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
  File "cupy\cuda\runtime.pyx", line 229, in cupy.cuda.runtime.hostAlloc
  File "cupy\cuda\runtime.pyx", line 135, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorLaunchTimeout: the launch timed out and was terminated

My environment is:

  • Windows 10
  • python 3.6.5
  • cupy-cuda92 5.0.0b3

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
alkalinincommented, Aug 20, 2018

My latest statistics with n = 10000. With C_gpu[0, 0].get() line I have 500 successful iterations

...
iter 494/499, memory: 1.12 G
iter 495/499, memory: 1.12 G
iter 496/499, memory: 1.12 G
iter 497/499, memory: 1.12 G
iter 498/499, memory: 1.12 G
iter 499/499, memory: 1.12 G

Process finished with exit code 0

For my practical tasks it is really works, for me problem is SOLVED!!!

0reactions
w-mcommented, Aug 20, 2018

I still suspect this is a watchdog issue

Ah indeed, I missed that the error in the original report is a cudaErrorLaunchTimeout. Tried it locally and for some n and max_iter, my machine actually ran out of host memory, instead of throwing the error. I guess they’re different effects, but have somewhat the same underlying cause (too many async unfinished operations).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why do I receive the "CUDA_ERROR_LAUNCH_TIMEOUT ...
This error occurs when a gpuArray operation or a CUDA kernel code runs for a long time on a GPU that is used...
Read more >
CUDA ERROR LAUNCH TIMEOUT || Problem Solved
CUDA ERROR LAUNCH TIMEOUT || Problem Solved. 34 views 2 months ago. Rahmadya Trias. Rahmadya Trias. 559 subscribers. Subscribe.
Read more >
How to avoid Cuda error 6 (Launch Timeout) with consecutive ...
I would like to avoid this synchronization, because it slows the program down a lot. Since kernel launches are asynchronous, I guess the...
Read more >
cuda the launch timed out and was terminated
The kernel execute without problems. ... I am using Windows and one video card, so i dont know how finishing X or explorer...
Read more >
"the launch timed out and was terminated". What is wrong?
If you are running Windows, there are two possible reasons why this error message appears. You are either using a GeForce card as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found