question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NEP-18: CUDA illegal memory access with CuPy

See original GitHub issue

Calling .compute() on multiple results from dask.linalg.svd() or dask.linalg.qr() causes a CUDA illegal memory access. Example:

import cupy
import dask.array as da

x = cupy.random.random((5000, 1000))

d = da.from_array(x, chunks=(1000, 1000), asarray=False)

u, s, v = da.linalg.svd(d)
s.compute()
v.compute()
Traceback (most recent call last):
  File "svd_illegal_mem.py", line 10, in <module>
    v.compute()
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py", line 76, in get
    pack_exception=pack_exception, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 474, in get_async
    finish(dsk, state, not succeeded)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/callbacks.py", line 99, in local_callbacks
    yield callbacks or ()
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 459, in get_async
    raise_exception(exc, tb)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/compatibility.py", line 112, in reraise
    raise exc
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 230, in execute_task
    result = _execute_task(task, data)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 118, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 118, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/optimization.py", line 942, in __call__
    dict(zip(self.inkeys, args)))
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/array/linalg.py", line 49, in _wrapped_qr
    return np.linalg.qr(a)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/numpy/core/overrides.py", line 165, in public_api
    implementation, public_api, relevant_args, args, kwargs)
  File "cupy/core/core.pyx", line 1256, in cupy.core.core.ndarray.__array_function__
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/cupy/linalg/decomposition.py", line 135, in qr
    tau.data.ptr, workspace.data.ptr, buffersize, dev_info.data.ptr)
  File "cupy/cuda/cusolver.pyx", line 472, in cupy.cuda.cusolver.dgeqrf
  File "cupy/cuda/cusolver.pyx", line 479, in cupy.cuda.cusolver.dgeqrf
  File "cupy/cuda/cusolver.pyx", line 243, in cupy.cuda.cusolver.check_status
cupy.cuda.cusolver.CUSOLVERError: CUSOLVER_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Error in sys.excepthook:

Original exception was:

From the example above, the second call always fail, if we execute first v.compute() before s.compute(), the latter will have an illegal memory access. The same happens if you call multiple times .compute() on the same return value, and the same behavior happens for dask.linalg.qr(). Please note I intentionally ignored the value u here, because it fails due to bug https://github.com/dask/dask/issues/4481.

Also note CuPy alone doesn’t fail, nor Dask with NumPy. The error only occurs with Dask on a CuPy array.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
pentschevcommented, Feb 14, 2019

Using compute_svd=False (AFAIK, available only in dask.linalg.tsqr()) gives the same result. I guess the problem is actually in dask.linalg.qr(), since the traceback shows that both svd() and tsqr() are calling qr() shortly before the memory errors.

0reactions
mrocklincommented, Apr 8, 2019

Thanks for tracking this @pentschev !

On Mon, Apr 8, 2019 at 7:53 AM Peter Andreas Entschev < notifications@github.com> wrote:

Closed #4487 https://github.com/dask/dask/issues/4487.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/4487#event-2259577874, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGHDdykX5in-QYgI7Xvk5kXkTcM1ks5vezu-gaJpZM4a78Ey .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cuda illegal memory access error when using array indexes ...
The problem appears to be that the generated executable doesn't pass the variable ncell to the kernel correctly.
Read more >
An illegal memory access was encountered
I had a problem with that error. I've wrote a simple code and when I change Threads per Block to for example :...
Read more >
CUDA error: an illegal memory access was encountered ...
I implemented a pytorch cuda extension of xnor_gemm. when I run this gemm in a small demo.py there is no problem But there...
Read more >
CUDA error: an illegal memory access was encountered
Any ideas? CUDA error: an illegal memory access was encountered. CUDA kernel errors might be asynchronously reported at some other API call, ...
Read more >
CUDA error: an illegal memory access was encountered - Part ...
When I am running following code on Gradient, it is working fine but it is throwing me error after running for few seconds...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found