Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to create cublas handle

See original GitHub issue

Description

This code appears correct and working:

https://colab.research.google.com/drive/1b3XnflgL1yttHA5cOFKb3uSHPcHU64Hv?usp=share_link

But on HP-Victus laptop with Intel core and RTX 3050 GPU, it gives this error message:

2022-12-03 10:29:49.497205: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:219] failed to create cublas handle: cublas error
2022-12-03 10:29:49.497835: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:221] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2022-12-03 10:29:49.498309: E external/org_tensorflow/tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms) 
*** Begin stack trace ***
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	_PyObject_MakeTpCall
	
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	PyObject_Call
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	PyObject_Call
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	
	
	
	_PyObject_MakeTpCall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	
	PyEval_EvalCode
	
	
	
	_PyRun_SimpleFileObject
	_PyRun_AnyFileObject
	Py_RunMain
	Py_BytesMain
	
	__libc_start_main
	_start
*** End stack trace ***

Traceback (most recent call last):
  File "/home/reza/jjj3.py", line 17, in <module>
    yy = (xx(3,4,5))
  File "/home/reza/jjj3.py", line 15, in xx
    return (A,B, jax.numpy.matmul(A,B))
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/api.py", line 622, in cache_miss
    execute = dispatch._xla_call_impl_lazy(fun_, *tracers, **params)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 236, in _xla_call_impl_lazy
    return xla_callable(fun, device, backend, name, donated_invars, keep_unused,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/linear_util.py", line 303, in memoized_fun
    ans = call(fun, *args)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 360, in _xla_callable_uncached
    keep_unused, *arg_specs).compile().unsafe_call
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 996, in compile
    self._executable = XlaCompiledComputation.from_xla_computation(
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1194, in from_xla_computation
    compiled = compile_or_get_cached(backend, xla_computation, options,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1077, in compile_or_get_cached
    return backend_compile(backend, serialized_computation, compile_options,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1012, in backend_compile
    return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/reza/jjj3.py", line 17, in <module>
    yy = (xx(3,4,5))
  File "/home/reza/jjj3.py", line 15, in xx
    return (A,B, jax.numpy.matmul(A,B))
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)

If you replace random matrices with “ones” it works. Even the randomization seems to work. But when you randomize, the matmul fails.

I have installed jax with

pip install --upgrade pip

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

and CUDA 11.8 with the deb install from Nvidia.

CUDA matmul sample code works: can multiply ~3000x3000 matrices very easily.

I tried to explicitly say “import jax.numpy as jnp” and still got an error.

What’s the problem?! Do I basically have to compile jax for my machine?

What jax/jaxlib version are you using?

jax 0.3.25 jaxlib 0.3.25

Which accelerator(s) are you using?

RTX 3050 CUDA

Additional system info

Python 3.10.6 Ubuntu 22.04 latest updates + CUDA 11.8

NVIDIA GPU info

reza@HP:~$ nvidia-smi 
Sat Dec  3 10:19:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   28C    P0    N/A /  N/A |      5MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2014      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
reza@HP:~$

Issue Analytics

State:
Created 10 months ago
Comments:6 (1 by maintainers)

Top GitHub Comments

2reactions

shaneactoncommented, Dec 12, 2022

@RezaRob I fixed the issue on my side. First I downgraded to jax==0.3.22 following @tanmoyio, this didn’t solve the error but rather changed the error to be jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Attempting to perform BLAS operation using StreamExecutor without BLAS support. Googling this lead me to the actual fix which was to set gpu_options.allow_growth = True.

Full code: import tensorflow as tf print("executing TF bug workaround") config = tf.compat.v1.ConfigProto(gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8) ) config.gpu_options.allow_growth = True session = tf.compat.v1.Session(config=config) tf.compat.v1.keras.backend.set_session(session)

which needs to be executed at the start of your program. This is a common TF bug workaround

1reaction

shaneactoncommented, Dec 12, 2022

Jax and TF should host a masterclass in bad error reporting