`Blas xGEMV launch failed` but it worked on the same GPU last week
See original GitHub issuetensorflow : 2.8.2+zzzcolab20220929150707
Driver Version: 460.32.03
CUDA Version: 11.2
GPU: Tesla T4
Tier: Colab Pro - I have “compute units” enough.
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
7162 def raise_from_not_ok_status(e, name):
7163 e.message += (" name: " + name if name is not None else "")
-> 7164 raise core._status_to_exception(e) from None # pylint: disable=protected-access
7165
7166
InternalError: Exception encountered when calling layer "dense_1" (type Dense).
Blas xGEMV launch failed : a.shape=[1,4704000,8], b.shape=[1,8,1], m=4704000, n=1, k=8 [Op:MatMul]
Call arguments received by layer "dense_1" (type Dense):
• inputs=tf.Tensor(shape=(196, 24000, 8), dtype=float32)
I found out this error suggests that out of memory. But this NN training was working correctly on Tesla T4 last week. (Oct. 6th)
Now, training with the same codes and same datasets can not work. I tried with tf 2.7.4 and 2.9.2 but not working (same error).
I also compared the driver version and CUDA version from another project which was executed ‘!nvidia-smi’ last week and working correctly. Both versions are the same.
Additionally, another one does not work now because it is the same NN and uses really similar datasets.
Has Colab updated something? Should I update or downgrade something like CUDA version?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (2 by maintainers)
Thanks for sharing the notebook. I was able to reproduce the error from the OP:
Blas xGEMV launch failed...
I have determined that the error raised in your notebook is due to the version of cuBLAS available in Colab. A similar issue was captured in https://github.com/tensorflow/tensorflow/issues/54463 and is noted in the cuBLAS release notes as fixed in version 11.4: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-11.4.0
A work-around you may want to try that I had success with is to install
libcublas-11-4
as a first cell:Then execute the remainder of your notebook as normal:
This may provide a way forward until an upgraded libcublas can be brought into Colab.
Hi @wp45rw,
As I have mentioned here, would you mind trying downgrading from to CUDA 11.1?
If you are using conda environment, following installation steps in https://www.tensorflow.org/install/pip,
conda install -c conda-forge cudatoolkit=11.1 cudatoolkit-dev=11.1 cudnn=8.1.0 -y
sets up my environment perfectly. I have been doing this to circumvent the issue with CUDA 11.2 and you might find this helpful as well.Test With CUDA 11.1 (No Error)
Test With CUDA 11.2 (Throws INTERNAL: Blas xGEMV launch failed Error)