question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cuSolver internal error on freshly installed cuda11.1 from conda-forge

See original GitHub issue

In freshly installed python 3.9 environment with cuda11.1 and cudnn any call to jax.numpy.linalg.qr produceses and error RuntimeError: jaxlib/cusolver.cc:52: operation cusolverDnCreate(&handle) failed: cuSolver internal error.

Installation:

conda create -n test -c conda-forge python=3.9 cudatoolkit=11.1 cudnn
conda activate test
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html  # Note: wheels only available on linux.
python
>>> import jax
>>> jax.numpy.linalg.qr(jax.numpy.ones([3, 3]))
2021-09-25 11:10:05.278742: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such fi
le or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/numpy/linalg.py", line 468, in qr
    q, r = lax_linalg.qr(a, full_matrices)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/lax/linalg.py", line 197, in qr
    q, r = qr_p.bind(x, full_matrices=full_matrices)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/core.py", line 267, in bind
    out = top_trace.process_primitive(self, tracers, params)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/core.py", line 612, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/lax/linalg.py", line 1092, in qr_impl
    q, r = xla.apply_primitive(qr_p, operand, full_matrices=full_matrices)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/interpreters/xla.py", line 275, in apply_primitive
    compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/util.py", line 195, in wrapper
    return cached(config._trace_context(), *args, **kwargs)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/util.py", line 188, in cached
    return f(*args, **kwargs)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/interpreters/xla.py", line 317, in xla_primitive_callable
    built_c = primitive_computation(prim, AxisEnv(nreps, (), ()), backend,
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/util.py", line 195, in wrapper
    return cached(config._trace_context(), *args, **kwargs)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/util.py", line 188, in cached
    return f(*args, **kwargs)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/interpreters/xla.py", line 357, in primitive_computation
    ans = rule(c, *xla_args, **params)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jax/_src/lax/linalg.py", line 1144, in _qr_cpu_gpu_translation_rule
    r, tau, info_geqrf = geqrf_impl(c, operand)
  File "/home/gnovikov/data/miniconda3/envs/test/lib/python3.9/site-packages/jaxlib/cusolver.py", line 200, in geqrf
    lwork, opaque = cusolver_kernels.build_geqrf_descriptor(
RuntimeError: jaxlib/cusolver.cc:52: operation cusolverDnCreate(&handle) failed: cuSolver internal error

I am fairly sure, that QR-decomposition is not the only one that would produce an error in the considered setup. The same thing with cudatoolkit=10.2 (still from conda) works OK. Same thing on another machine with cuda11.1 not from conda works as well.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

2reactions
k-khrcommented, Oct 19, 2021

While I had the same issue, I solved it by setting LD_LIBRARY_PATH .

export LD_LIBRARY_PATH=/path/to/miniconda3/envs/{your_env_name}/lib
python -c "import jax; jax.numpy.linalg.qr(jax.numpy.ones([3,3]))"
# works fine
0reactions
n-gaocommented, Sep 8, 2022

One has to manually set the LD_LIBRARY_PATH, is there a better solution for this? I personally set it up in my conda environment via conda env config vars set LD_LIBRARY_PATH=.... But, this has to be done for each new environment and quickly gets forgotten if one creates a new environment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cuda Driver API and CUSolver internal error
In my project ( Rust lang) I use Cuda driver api through FFI and need to compute eigenvalues . I thought that CUSovler...
Read more >
CuPy Documentation - Read the Docs
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in.
Read more >
cudatoolkit=11.1.0 when installing pytorch - Stack Overflow
When I run the command: conda install pytorch torchvision torchaudio cudatoolkit=11.1.0 -c pytorch I get the following error: Collecting package ...
Read more >
Install conda and set up a Pytorch 1.7, CUDA 11.1 ...
In this fast post, you will know how to set up an environment using conda (Anaconda) and PyTorch last stable version (1.7.1) with...
Read more >
Cuda not compatible with PyTorch installation error while ...
But by implementing conda install pytorch==1.8.1 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge , I got the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found