`cudaErrorLaunchTimeout` error in Windows
See original GitHub issueThe following code works perfect:
import numpy as np
import cupy as cp
mp = cp.get_default_memory_pool()
n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)
max_iter = 5
for i in range(max_iter):
A_gpu = cp.asarray(A_cpu)
B_gpu = cp.asarray(B_cpu)
C_gpu = cp.zeros((n, n), dtype=np.float32)
#cp.dot(A_gpu, B_gpu, out=C_gpu)
print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))
Output is:
iter 0/9, memory: 1.61 G
iter 1/9, memory: 1.61 G
iter 2/9, memory: 1.61 G
iter 3/9, memory: 1.61 G
iter 4/9, memory: 1.61 G
However, code with cp.dot() produces exception ‘cupy.cuda.runtime.hostAlloc’:
import numpy as np
import cupy as cp
mp = cp.get_default_memory_pool()
n = 12000
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)
max_iter = 5
for i in range(max_iter):
A_gpu = cp.asarray(A_cpu)
B_gpu = cp.asarray(B_cpu)
C_gpu = cp.zeros((n, n), dtype=np.float32)
cp.dot(A_gpu, B_gpu, out=C_gpu)
print('iter {:d}/{:d}, memory: {:.2f} G'.format(i, max_iter - 1 , mp.used_bytes() / (2**30)))
Output is:
iter 0/4, memory: 1.61 G
iter 1/4, memory: 1.61 G
iter 2/4, memory: 1.61 G
iter 3/4, memory: 1.61 G
Traceback (most recent call last):
File "D:/WST/projects/nuance/src-py/nuance/playground.py", line 13, in <module>
B_gpu = cp.asarray(B_cpu)
File "C:\Dev\Python\python-3.6\Lib\site-packages\cupy\creation\from_data.py", line 60, in asarray
return core.array(a, dtype, False)
File "cupy\core\core.pyx", line 2117, in cupy.core.core.array
File "cupy\core\core.pyx", line 2157, in cupy.core.core.array
File "cupy\cuda\pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
File "cupy\cuda\pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
File "cupy\cuda\pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
File "cupy\cuda\pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
File "cupy\cuda\runtime.pyx", line 229, in cupy.cuda.runtime.hostAlloc
File "cupy\cuda\runtime.pyx", line 135, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorLaunchTimeout: the launch timed out and was terminated
My environment is:
- Windows 10
- python 3.6.5
- cupy-cuda92 5.0.0b3
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Why do I receive the "CUDA_ERROR_LAUNCH_TIMEOUT ...
This error occurs when a gpuArray operation or a CUDA kernel code runs for a long time on a GPU that is used...
Read more >CUDA ERROR LAUNCH TIMEOUT || Problem Solved
CUDA ERROR LAUNCH TIMEOUT || Problem Solved. 34 views 2 months ago. Rahmadya Trias. Rahmadya Trias. 559 subscribers. Subscribe.
Read more >How to avoid Cuda error 6 (Launch Timeout) with consecutive ...
I would like to avoid this synchronization, because it slows the program down a lot. Since kernel launches are asynchronous, I guess the...
Read more >cuda the launch timed out and was terminated
The kernel execute without problems. ... I am using Windows and one video card, so i dont know how finishing X or explorer...
Read more >"the launch timed out and was terminated". What is wrong?
If you are running Windows, there are two possible reasons why this error message appears. You are either using a GeForce card as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My latest statistics with
n = 10000
. WithC_gpu[0, 0].get()
line I have 500 successful iterationsFor my practical tasks it is really works, for me problem is SOLVED!!!
Ah indeed, I missed that the error in the original report is a
cudaErrorLaunchTimeout
. Tried it locally and for somen
andmax_iter
, my machine actually ran out of host memory, instead of throwing the error. I guess they’re different effects, but have somewhat the same underlying cause (too many async unfinished operations).