Failed to create cublas handle
See original GitHub issueDescription
This code appears correct and working:
https://colab.research.google.com/drive/1b3XnflgL1yttHA5cOFKb3uSHPcHU64Hv?usp=share_link
But on HP-Victus laptop with Intel core and RTX 3050 GPU, it gives this error message:
2022-12-03 10:29:49.497205: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:219] failed to create cublas handle: cublas error
2022-12-03 10:29:49.497835: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:221] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2022-12-03 10:29:49.498309: E external/org_tensorflow/tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)
*** Begin stack trace ***
_PyObject_MakeTpCall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
PyObject_Call
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
PyObject_Call
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyObject_MakeTpCall
_PyEval_EvalFrameDefault
_PyFunction_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
Py_BytesMain
__libc_start_main
_start
*** End stack trace ***
Traceback (most recent call last):
File "/home/reza/jjj3.py", line 17, in <module>
yy = (xx(3,4,5))
File "/home/reza/jjj3.py", line 15, in xx
return (A,B, jax.numpy.matmul(A,B))
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/api.py", line 622, in cache_miss
execute = dispatch._xla_call_impl_lazy(fun_, *tracers, **params)
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 236, in _xla_call_impl_lazy
return xla_callable(fun, device, backend, name, donated_invars, keep_unused,
File "/home/reza/.local/lib/python3.10/site-packages/jax/linear_util.py", line 303, in memoized_fun
ans = call(fun, *args)
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 360, in _xla_callable_uncached
keep_unused, *arg_specs).compile().unsafe_call
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 996, in compile
self._executable = XlaCompiledComputation.from_xla_computation(
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1194, in from_xla_computation
compiled = compile_or_get_cached(backend, xla_computation, options,
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1077, in compile_or_get_cached
return backend_compile(backend, serialized_computation, compile_options,
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
return func(*args, **kwargs)
File "/home/reza/.local/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1012, in backend_compile
return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/reza/jjj3.py", line 17, in <module>
yy = (xx(3,4,5))
File "/home/reza/jjj3.py", line 15, in xx
return (A,B, jax.numpy.matmul(A,B))
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:327) stream->parent()->GetBlasGemmAlgorithms(stream, &algorithms)
If you replace random matrices with “ones” it works. Even the randomization seems to work. But when you randomize, the matmul fails.
I have installed jax with
pip install --upgrade pip
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
and CUDA 11.8 with the deb install from Nvidia.
CUDA matmul sample code works: can multiply ~3000x3000 matrices very easily.
I tried to explicitly say “import jax.numpy as jnp” and still got an error.
What’s the problem?! Do I basically have to compile jax for my machine?
What jax/jaxlib version are you using?
jax 0.3.25 jaxlib 0.3.25
Which accelerator(s) are you using?
RTX 3050 CUDA
Additional system info
Python 3.10.6 Ubuntu 22.04 latest updates + CUDA 11.8
NVIDIA GPU info
reza@HP:~$ nvidia-smi
Sat Dec 3 10:19:39 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 28C P0 N/A / N/A | 5MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2014 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
reza@HP:~$
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
failed to create cublas handle ...
When I have low memory and ask for a new session for detection I hit this error, when I clear the gpu of...
Read more >Failed to create CUBLAS handle. Tensorflow interaction with ...
The PyPi build of Tensorflow GPU 2.2 uses CUDA 10.1 and libcublas 10.2.1.243, but I had cublas 10.2.2.89 installed. To solve it: Centos:...
Read more >CUBLAS_STATUS_NOT_iNITIA...
This greedy allocation method uses up nearly all GPU memory. When CUBLAS is asked to initialize (later), it requires some GPU memory to ......
Read more >Caffe: make runtest error "Cannot create Cublas handle ...
I'm using Ubuntu 14.04, cuda toolkit 7.5, nvidia driver 352. (checking nvidia-smi and nvcc --version, the driver and the cuda toolkit version can...
Read more >[텐서플로우2] failed to create cublas handle ... - 네이버블로그
failed to create cublas handle:CUBLAS_STATUS_ALLOC_FAILED 는 GPU 메모리에 연산을 할당하는 과정에서 발생한 Error이다. 여러 원인이 있지만, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@RezaRob I fixed the issue on my side. First I downgraded to
jax==0.3.22
following @tanmoyio, this didn’t solve the error but rather changed the error to bejaxlib.xla_extension.XlaRuntimeError: INTERNAL: Attempting to perform BLAS operation using StreamExecutor without BLAS support
. Googling this lead me to the actual fix which was to setgpu_options.allow_growth = True
.Full code:
import tensorflow as tf print("executing TF bug workaround") config = tf.compat.v1.ConfigProto(gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8) ) config.gpu_options.allow_growth = True session = tf.compat.v1.Session(config=config) tf.compat.v1.keras.backend.set_session(session)
which needs to be executed at the start of your program. This is a common TF bug workaround
Jax and TF should host a masterclass in bad error reporting