Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Blas xGEMV launch failed` but it worked on the same GPU last week

See original GitHub issue

tensorflow : 2.8.2+zzzcolab20220929150707 Driver Version: 460.32.03
CUDA Version: 11.2 GPU: Tesla T4 Tier: Colab Pro - I have “compute units” enough.

/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   7162 def raise_from_not_ok_status(e, name):
   7163   e.message += (" name: " + name if name is not None else "")
-> 7164   raise core._status_to_exception(e) from None  # pylint: disable=protected-access
   7165 
   7166 

InternalError: Exception encountered when calling layer "dense_1" (type Dense).

Blas xGEMV launch failed : a.shape=[1,4704000,8], b.shape=[1,8,1], m=4704000, n=1, k=8 [Op:MatMul]

Call arguments received by layer "dense_1" (type Dense):
  • inputs=tf.Tensor(shape=(196, 24000, 8), dtype=float32)

I found out this error suggests that out of memory. But this NN training was working correctly on Tesla T4 last week. (Oct. 6th)

Now, training with the same codes and same datasets can not work. I tried with tf 2.7.4 and 2.9.2 but not working (same error).

I also compared the driver version and CUDA version from another project which was executed ‘!nvidia-smi’ last week and working correctly. Both versions are the same.

Additionally, another one does not work now because it is the same NN and uses really similar datasets.

Has Colab updated something? Should I update or downgrade something like CUDA version?

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

metrizablecommented, Oct 18, 2022

Thanks for sharing the notebook. I was able to reproduce the error from the OP: Blas xGEMV launch failed...

I have determined that the error raised in your notebook is due to the version of cuBLAS available in Colab. A similar issue was captured in https://github.com/tensorflow/tensorflow/issues/54463 and is noted in the cuBLAS release notes as fixed in version 11.4: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-11.4.0

A work-around you may want to try that I had success with is to install libcublas-11-4 as a first cell:

!apt-get install libcublas-11-4

Screenshot from 2022-10-18 14-43-20

Then execute the remainder of your notebook as normal:

Screenshot from 2022-10-18 14-44-22

This may provide a way forward until an upgraded libcublas can be brought into Colab.

0reactions

arvindrajan92commented, Oct 25, 2022

Hi @wp45rw,

As I have mentioned here, would you mind trying downgrading from to CUDA 11.1?

If you are using conda environment, following installation steps in https://www.tensorflow.org/install/pip, conda install -c conda-forge cudatoolkit=11.1 cudatoolkit-dev=11.1 cudnn=8.1.0 -y sets up my environment perfectly. I have been doing this to circumvent the issue with CUDA 11.2 and you might find this helpful as well.

Test With CUDA 11.1 (No Error)

conda create -n cuda111 python=3.9.12 -y
conda activate cuda111
conda install -c conda-forge cudatoolkit=11.1 cudatoolkit-dev=11.1 cudnn=8.1.0 -y
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow==2.9.1
python -c "import tensorflow as tf; empty_image = tf.zeros(shape=[1280, 1280, 3], dtype=tf.float32); gray_image = tf.image.rgb_to_grayscale(empty_image); print(tf.shape(gray_image))"

Test With CUDA 11.2 (Throws INTERNAL: Blas xGEMV launch failed Error)

conda create -n cuda112 python=3.9.12 -y
conda activate cuda112
conda install -c conda-forge cudatoolkit=11.2 cudatoolkit-dev=11.2 cudnn=8.1.0 -y
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow==2.9.1
python -c "import tensorflow as tf; empty_image = tf.zeros(shape=[1280, 1280, 3], dtype=tf.float32); gray_image = tf.image.rgb_to_grayscale(empty_image); print(tf.shape(gray_image))"

Top Results From Across the Web

TensorFlow Blas GEMM launch failed - Stack Overflow

Show activity on this post. limiting the GPU memory growth doesn't work for me. Instead, removing the contents of ~/.nv solved in my...

How to fix the TensorFlow GPU 'Blas GEMM launch failed' error

This is useful if you want to truly bound the amount of GPU memory available to the TensorFlow process. config = tf.ConfigProto(). config.gpu_options....

Error Internal: Blas GEMM launch failed

Hi, I an encountering an error when I moved to a new laptop with RTX3070. I am new to GPU world and I...

5 Signs Your Graphics Card Has Problems and May Be Dying

Graphics cards can fail in several different ways, but there are usually warning ... In cases like this, the video card will usually...

Setup and use CUDA and TensorFlow in Windows Subsystem ...

The diagram shows Microsoft Windows GPU machines running on the NVIDIA hardware. For the software. This article walks through the installation ...