question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to create cublas handle

See original GitHub issue

Description

This code appears correct and working:

https://colab.research.google.com/drive/1b3XnflgL1yttHA5cOFKb3uSHPcHU64Hv?usp=share_link

But on HP-Victus laptop with Intel core and RTX 3050 GPU, it gives this error message:

2022-12-03 10:29:49.497205: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:219] failed to create cublas handle: cublas error
2022-12-03 10:29:49.497835: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:221] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2022-12-03 10:29:49.498309: E external/org_tensorflow/tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms) 
*** Begin stack trace ***
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	_PyObject_MakeTpCall
	
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	PyObject_Call
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	PyObject_Call
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	
	
	
	_PyObject_MakeTpCall
	_PyEval_EvalFrameDefault
	_PyFunction_Vectorcall
	_PyEval_EvalFrameDefault
	
	PyEval_EvalCode
	
	
	
	_PyRun_SimpleFileObject
	_PyRun_AnyFileObject
	Py_RunMain
	Py_BytesMain
	
	__libc_start_main
	_start
*** End stack trace ***

Traceback (most recent call last):
  File "/home/reza/jjj3.py", line 17, in <module>
    yy = (xx(3,4,5))
  File "/home/reza/jjj3.py", line 15, in xx
    return (A,B, jax.numpy.matmul(A,B))
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/api.py", line 622, in cache_miss
    execute = dispatch._xla_call_impl_lazy(fun_, *tracers, **params)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 236, in _xla_call_impl_lazy
    return xla_callable(fun, device, backend, name, donated_invars, keep_unused,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/linear_util.py", line 303, in memoized_fun
    ans = call(fun, *args)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 360, in _xla_callable_uncached
    keep_unused, *arg_specs).compile().unsafe_call
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 996, in compile
    self._executable = XlaCompiledComputation.from_xla_computation(
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1194, in from_xla_computation
    compiled = compile_or_get_cached(backend, xla_computation, options,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1077, in compile_or_get_cached
    return backend_compile(backend, serialized_computation, compile_options,
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1012, in backend_compile
    return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/reza/jjj3.py", line 17, in <module>
    yy = (xx(3,4,5))
  File "/home/reza/jjj3.py", line 15, in xx
    return (A,B, jax.numpy.matmul(A,B))
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms) 

If you replace random matrices with “ones” it works. Even the randomization seems to work. But when you randomize, the matmul fails.

I have installed jax with

pip install --upgrade pip

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

and CUDA 11.8 with the deb install from Nvidia.

CUDA matmul sample code works: can multiply ~3000x3000 matrices very easily.

I tried to explicitly say “import jax.numpy as jnp” and still got an error.

What’s the problem?! Do I basically have to compile jax for my machine?

What jax/jaxlib version are you using?

jax 0.3.25 jaxlib 0.3.25

Which accelerator(s) are you using?

RTX 3050 CUDA

Additional system info

Python 3.10.6 Ubuntu 22.04 latest updates + CUDA 11.8

NVIDIA GPU info

reza@HP:~$ nvidia-smi 
Sat Dec  3 10:19:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   28C    P0    N/A /  N/A |      5MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2014      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
reza@HP:~$ 

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
shaneactoncommented, Dec 12, 2022

@RezaRob I fixed the issue on my side. First I downgraded to jax==0.3.22 following @tanmoyio, this didn’t solve the error but rather changed the error to be jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Attempting to perform BLAS operation using StreamExecutor without BLAS support. Googling this lead me to the actual fix which was to set gpu_options.allow_growth = True.

Full code: import tensorflow as tf print("executing TF bug workaround") config = tf.compat.v1.ConfigProto(gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8) ) config.gpu_options.allow_growth = True session = tf.compat.v1.Session(config=config) tf.compat.v1.keras.backend.set_session(session)

which needs to be executed at the start of your program. This is a common TF bug workaround

1reaction
shaneactoncommented, Dec 12, 2022

Jax and TF should host a masterclass in bad error reporting

Read more comments on GitHub >

github_iconTop Results From Across the Web

failed to create cublas handle ...
When I have low memory and ask for a new session for detection I hit this error, when I clear the gpu of...
Read more >
Failed to create CUBLAS handle. Tensorflow interaction with ...
The PyPi build of Tensorflow GPU 2.2 uses CUDA 10.1 and libcublas 10.2.1.243, but I had cublas 10.2.2.89 installed. To solve it: Centos:...
Read more >
CUBLAS_STATUS_NOT_iNITIA...
This greedy allocation method uses up nearly all GPU memory. When CUBLAS is asked to initialize (later), it requires some GPU memory to ......
Read more >
Caffe: make runtest error "Cannot create Cublas handle ...
I'm using Ubuntu 14.04, cuda toolkit 7.5, nvidia driver 352. (checking nvidia-smi and nvcc --version, the driver and the cuda toolkit version can...
Read more >
[텐서플로우2] failed to create cublas handle ... - 네이버블로그
failed to create cublas handle:CUBLAS_STATUS_ALLOC_FAILED 는 GPU 메모리에 연산을 할당하는 과정에서 발생한 Error이다. 여러 원인이 있지만, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found