question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] `cp.dot` causes illegal memory access encountered

See original GitHub issue
  • Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()') CuPy Version : 7.3.0 CUDA Root : /usr/local/cuda CUDA Build Version : 10000 CUDA Driver Version : 10010 CUDA Runtime Version : 10000 cuBLAS Version : 10000 cuFFT Version : 10000 cuRAND Version : 10000 cuSOLVER Version : (10, 0, 0) cuSPARSE Version : 10000 NVRTC Version : (10, 0) cuDNN Build Version : 7605 cuDNN Version : 7600 NCCL Build Version : 2406 NCCL Runtime Version : 2604

  • Code to reproduce

import cupy as cp
X = cp.random.rand(100000000*40, dtype='float32')
X = X.reshape((100000000, 40), order='F')
B = 2 * cp.random.rand(30, 2, dtype='float32') - 1
X[:, 30:32] = cp.dot(X[:, :30], B)
  • Error messages, stack traces, or logs
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cupy/core/core.pyx", line 1248, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/_routines_indexing.pyx", line 49, in cupy.core._routines_indexing._ndarray_setitem
  File "cupy/core/_routines_indexing.pyx", line 810, in cupy.core._routines_indexing._scatter_op
  File "cupy/core/_kernel.pyx", line 951, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 974, in cupy.core._kernel.ufunc._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 714, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 61, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 194, in cupy.core.core.compile_with_cache
  File "/home/dgala/miniconda3/envs/cuml_try/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 287, in compile_with_cache
    extra_source, backend)
  File "/home/dgala/miniconda3/envs/cuml_try/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 335, in _compile_with_cache_cuda
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 197, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 199, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 240, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
anarusecommented, Sep 1, 2020

The cause of the problem has been nearly identified. There seems to be a bug in the gemm implementation of cuBLAS in CUDA 10.2 or older. At least one of the input matrices has more than 2 giga elements and when the matrix is transposed in cuBLAS, the results becomes incorrect or a segmentation fault occurs.

This bug is fixed in CUDA 11.

You might work around this problem by transposing the matrices in CuPy before calling cuBLAS gemms, since the problem will not occur if matrices are not transposed in cuBLAS, However, it will increase the memory usage…

0reactions
kmaehashicommented, Aug 11, 2021

Let me close this as the issue is fixed in the latest CUDA.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is this a bug in CUDA? (illegal memory access was ...
1 Answer 1 ... TL;DR: The observed behavior is very likely caused by a bug in the ptxas component of the CUDA 7.5...
Read more >
Bug listing with status RESOLVED with resolution OBSOLETE ...
Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild ... on amd64 due to out of memory error" status:RESOLVED resolution:OBSOLETE severity:normal ...
Read more >
Questions regarding optixAccelBuild and "an illegal memory ...
I wrote a standalone C++/CUDA library that deals with OptiX 7.3, and the Python code calls that library through a pybind11 wrapper. So...
Read more >
Known Issues for JDK 8 - Oracle
This document describes known issues in the Oracle JDK 8 release.
Read more >
Trouble running miniZ
GPU[8]: CUDA error 'an illegal memory access was encountered' in func 'eq ... Maybe a bug with miniZ integration into Nicehash miner.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found