Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] `cp.dot` causes illegal memory access encountered

See original GitHub issue

Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()') CuPy Version : 7.3.0 CUDA Root : /usr/local/cuda CUDA Build Version : 10000 CUDA Driver Version : 10010 CUDA Runtime Version : 10000 cuBLAS Version : 10000 cuFFT Version : 10000 cuRAND Version : 10000 cuSOLVER Version : (10, 0, 0) cuSPARSE Version : 10000 NVRTC Version : (10, 0) cuDNN Build Version : 7605 cuDNN Version : 7600 NCCL Build Version : 2406 NCCL Runtime Version : 2604
Code to reproduce

import cupy as cp
X = cp.random.rand(100000000*40, dtype='float32')
X = X.reshape((100000000, 40), order='F')
B = 2 * cp.random.rand(30, 2, dtype='float32') - 1
X[:, 30:32] = cp.dot(X[:, :30], B)

Error messages, stack traces, or logs

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cupy/core/core.pyx", line 1248, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/_routines_indexing.pyx", line 49, in cupy.core._routines_indexing._ndarray_setitem
  File "cupy/core/_routines_indexing.pyx", line 810, in cupy.core._routines_indexing._scatter_op
  File "cupy/core/_kernel.pyx", line 951, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 974, in cupy.core._kernel.ufunc._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 714, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 61, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 194, in cupy.core.core.compile_with_cache
  File "/home/dgala/miniconda3/envs/cuml_try/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 287, in compile_with_cache
    extra_source, backend)
  File "/home/dgala/miniconda3/envs/cuml_try/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 335, in _compile_with_cache_cuda
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 197, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 199, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 240, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Issue Analytics

State:
Created 3 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

3reactions

anarusecommented, Sep 1, 2020

The cause of the problem has been nearly identified. There seems to be a bug in the gemm implementation of cuBLAS in CUDA 10.2 or older. At least one of the input matrices has more than 2 giga elements and when the matrix is transposed in cuBLAS, the results becomes incorrect or a segmentation fault occurs.

This bug is fixed in CUDA 11.

You might work around this problem by transposing the matrices in CuPy before calling cuBLAS gemms, since the problem will not occur if matrices are not transposed in cuBLAS, However, it will increase the memory usage…

0reactions

kmaehashicommented, Aug 11, 2021

Let me close this as the issue is fixed in the latest CUDA.

Top Results From Across the Web

Is this a bug in CUDA? (illegal memory access was ...

1 Answer 1 ... TL;DR: The observed behavior is very likely caused by a bug in the ptxas component of the CUDA 7.5...

Bug listing with status RESOLVED with resolution OBSOLETE ...

Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild ... on amd64 due to out of memory error" status:RESOLVED resolution:OBSOLETE severity:normal ...

Questions regarding optixAccelBuild and "an illegal memory ...

I wrote a standalone C++/CUDA library that deals with OptiX 7.3, and the Python code calls that library through a pybind11 wrapper. So...

Known Issues for JDK 8 - Oracle

This document describes known issues in the Oracle JDK 8 release.

Trouble running miniZ

GPU[8]: CUDA error 'an illegal memory access was encountered' in func 'eq ... Maybe a bug with miniZ integration into Nicehash miner.