NEP-18: CUDA illegal memory access with CuPy
See original GitHub issueCalling .compute()
on multiple results from dask.linalg.svd()
or dask.linalg.qr()
causes a CUDA illegal memory access. Example:
import cupy
import dask.array as da
x = cupy.random.random((5000, 1000))
d = da.from_array(x, chunks=(1000, 1000), asarray=False)
u, s, v = da.linalg.svd(d)
s.compute()
v.compute()
Traceback (most recent call last):
File "svd_illegal_mem.py", line 10, in <module>
v.compute()
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 156, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 398, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py", line 76, in get
pack_exception=pack_exception, **kwargs)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 474, in get_async
finish(dsk, state, not succeeded)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/callbacks.py", line 99, in local_callbacks
yield callbacks or ()
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 459, in get_async
raise_exception(exc, tb)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/compatibility.py", line 112, in reraise
raise exc
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 230, in execute_task
result = _execute_task(task, data)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 118, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 118, in <listcomp>
args2 = [_execute_task(a, cache) for a in args]
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/optimization.py", line 942, in __call__
dict(zip(self.inkeys, args)))
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/array/linalg.py", line 49, in _wrapped_qr
return np.linalg.qr(a)
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/numpy/core/overrides.py", line 165, in public_api
implementation, public_api, relevant_args, args, kwargs)
File "cupy/core/core.pyx", line 1256, in cupy.core.core.ndarray.__array_function__
File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/cupy/linalg/decomposition.py", line 135, in qr
tau.data.ptr, workspace.data.ptr, buffersize, dev_info.data.ptr)
File "cupy/cuda/cusolver.pyx", line 472, in cupy.cuda.cusolver.dgeqrf
File "cupy/cuda/cusolver.pyx", line 479, in cupy.cuda.cusolver.dgeqrf
File "cupy/cuda/cusolver.pyx", line 243, in cupy.cuda.cusolver.check_status
cupy.cuda.cusolver.CUSOLVERError: CUSOLVER_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Error in sys.excepthook:
Original exception was:
From the example above, the second call always fail, if we execute first v.compute()
before s.compute()
, the latter will have an illegal memory access. The same happens if you call multiple times .compute()
on the same return value, and the same behavior happens for dask.linalg.qr()
. Please note I intentionally ignored the value u
here, because it fails due to bug https://github.com/dask/dask/issues/4481.
Also note CuPy alone doesn’t fail, nor Dask with NumPy. The error only occurs with Dask on a CuPy array.
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
Cuda illegal memory access error when using array indexes ...
The problem appears to be that the generated executable doesn't pass the variable ncell to the kernel correctly.
Read more >An illegal memory access was encountered
I had a problem with that error. I've wrote a simple code and when I change Threads per Block to for example :...
Read more >CUDA error: an illegal memory access was encountered ...
I implemented a pytorch cuda extension of xnor_gemm. when I run this gemm in a small demo.py there is no problem But there...
Read more >CUDA error: an illegal memory access was encountered
Any ideas? CUDA error: an illegal memory access was encountered. CUDA kernel errors might be asynchronously reported at some other API call, ...
Read more >CUDA error: an illegal memory access was encountered - Part ...
When I am running following code on Gradient, it is working fine but it is throwing me error after running for few seconds...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Using
compute_svd=False
(AFAIK, available only indask.linalg.tsqr()
) gives the same result. I guess the problem is actually indask.linalg.qr()
, since the traceback shows that bothsvd()
andtsqr()
are callingqr()
shortly before the memory errors.Thanks for tracking this @pentschev !
On Mon, Apr 8, 2019 at 7:53 AM Peter Andreas Entschev < notifications@github.com> wrote: