question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

v7.4.0 cupy/cuda/driver.pyx error line 118

See original GitHub issue

Hi,

I’m working in conda envs with conda installs. Hit a snag upgrading from cupy 6.0.0 to 7.4.0 with rapidsai.

The MRE runs in cupy 6.0.0 and crashes in 7.4.0 with this error:

Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 247, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable

In Jupyter Notebook the MRE errors one line later:

CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

The complete stack trace is in the attached notebook along with additional system and device specs.

Great tool box, thanks.

Tom


cupy740_crash_mre.ipynb.pdf

# MRE
import numpy as np
import cupy as cp

MB = 1024**2
cp.cuda.Device(3).use()

free, total = cp.cuda.Device(3).mem_info
print(f"MB free {free / MB :.0f} total {total / MB :.0f}")

# ok on these ...
# n, p, g = 250, 5, 10
# n, p, g = 2500, 25, 1000

# errors on these
n, p, g = 25000, 250, 10000

yg = np.random.rand(n, g).astype("float32")
X = np.random.rand(n, p).astype("float32")


ygd = cp.asarray(yg)
Xd = cp.asarray(X)
print(f"MB matrices: {(ygd.nbytes + Xd.nbytes) / MB :.0f}")
assert ygd.nbytes + Xd.nbytes < free 

Qd, Rd = cp.linalg.qr(Xd)
bhatsd = cp.linalg.solve(Rd, Qd.T @ ygd)
yhatsd = Xd @ bhatsd  # jupyter gets past this line

# ed = yhatsd - ygd   # jupyter errors on this line

[sandbox]$ conda activate cupy (cupy) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.7 CuPy Version : 6.0.0 CUDA Root : /usr/local/cuda-8.0 CUDA Build Version : 10000 CUDA Driver Version : 10020 CUDA Runtime Version : 10000 cuDNN Build Version : 7301 cuDNN Version : 7605 NCCL Build Version : 1000 NCCL Runtime Version : (unknown) MB free 12039 total 12196 MB matrices: 978 (cupy) [sandbox]$ conda deactivate [sandbox]$ conda activate rapidsai37 (rapidsai37) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.6 CuPy Version : 7.4.0 CUDA Root : /home/turbach/.conda/envs/rapidsai37 CUDA Build Version : 10020 CUDA Driver Version : 10020 CUDA Runtime Version : 10020 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 2) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2406 NCCL Runtime Version : 2507 MB free 12027 total 12196 MB matrices: 978 Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable Exception ignored in: ‘cupy.cuda.function.Module.dealloc’ Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable (rapidsai37) [sandbox]$

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
emcastillocommented, May 12, 2020

#3331 fix this bug, it was an error in the implementation of linalg.solve causing a memory corruption.

1reaction
emcastillocommented, May 11, 2020

I can’t reproduce wtih 10.2

CuPy Version          : 8.0.0b2
CUDA Root             : /usr/local/cuda
CUDA Build Version    : 10020
CUDA Driver Version   : 10020
CUDA Runtime Version  : 10020
cuBLAS Version        : 10202
cuFFT Version         : 10102
cuRAND Version        : 10102
cuSOLVER Version      : (10, 3, 0)
cuSPARSE Version      : 10301
NVRTC Version         : (10, 2)
cuDNN Build Version   : 7500
cuDNN Version         : 7500
NCCL Build Version    : None
NCCL Runtime Version  : None
Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found