v7.4.0 cupy/cuda/driver.pyx error line 118
See original GitHub issueHi,
I’m working in conda envs with conda installs. Hit a snag upgrading from cupy 6.0.0 to 7.4.0 with rapidsai.
The MRE runs in cupy 6.0.0 and crashes in 7.4.0 with this error:
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 247, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable
In Jupyter Notebook the MRE errors one line later:
CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
The complete stack trace is in the attached notebook along with additional system and device specs.
Great tool box, thanks.
Tom
# MRE
import numpy as np
import cupy as cp
MB = 1024**2
cp.cuda.Device(3).use()
free, total = cp.cuda.Device(3).mem_info
print(f"MB free {free / MB :.0f} total {total / MB :.0f}")
# ok on these ...
# n, p, g = 250, 5, 10
# n, p, g = 2500, 25, 1000
# errors on these
n, p, g = 25000, 250, 10000
yg = np.random.rand(n, g).astype("float32")
X = np.random.rand(n, p).astype("float32")
ygd = cp.asarray(yg)
Xd = cp.asarray(X)
print(f"MB matrices: {(ygd.nbytes + Xd.nbytes) / MB :.0f}")
assert ygd.nbytes + Xd.nbytes < free
Qd, Rd = cp.linalg.qr(Xd)
bhatsd = cp.linalg.solve(Rd, Qd.T @ ygd)
yhatsd = Xd @ bhatsd # jupyter gets past this line
# ed = yhatsd - ygd # jupyter errors on this line
[sandbox]$ conda activate cupy (cupy) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.7 CuPy Version : 6.0.0 CUDA Root : /usr/local/cuda-8.0 CUDA Build Version : 10000 CUDA Driver Version : 10020 CUDA Runtime Version : 10000 cuDNN Build Version : 7301 cuDNN Version : 7605 NCCL Build Version : 1000 NCCL Runtime Version : (unknown) MB free 12039 total 12196 MB matrices: 978 (cupy) [sandbox]$ conda deactivate [sandbox]$ conda activate rapidsai37 (rapidsai37) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.6 CuPy Version : 7.4.0 CUDA Root : /home/turbach/.conda/envs/rapidsai37 CUDA Build Version : 10020 CUDA Driver Version : 10020 CUDA Runtime Version : 10020 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 2) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2406 NCCL Runtime Version : 2507 MB free 12027 total 12196 MB matrices: 978 Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable Exception ignored in: ‘cupy.cuda.function.Module.dealloc’ Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable (rapidsai37) [sandbox]$
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (14 by maintainers)
Top GitHub Comments
#3331 fix this bug, it was an error in the implementation of
linalg.solve
causing a memory corruption.I can’t reproduce wtih 10.2