Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to determine best cudnn convolution algorithm/No GPU/TPU found

See original GitHub issue

RTX3080 / cuda11.1/cudnn 8.2.1/ubuntu16.04

This problem occurs in jaxlib-0.1.72+cuda111. When I update to 0.1.74, it will disappear. However, in 0.1.74, Jax cannot detect the existence of GPU, and tensorflow can

Therefore, whether I use 0.1.72 or 0.1.74, there is always a problem with me

`RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm: INTERNAL: All algorithms tried for %custom-call.1 = (f32[1,112,112,64]{2,1,3,0}, u8[0]{0}) custom-call(f32[1,229,229,3]{2,1,3,0} %pad, f32[7,7,3,64]{1,0,2,3} %copy.4), window={size=7x7 stride=2x2}, dim_labels=b01f_01io->b01f, custom_call_target=“__cudnn$convForward”, metadata={op_type=“conv_general_dilated” op_name=“jit(conv_general_dilated)/conv_general_dilated[\n batch_group_count=1\n dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2))\n feature_group_count=1\n lhs_dilation=(1, 1)\n lhs_shape=(1, 224, 224, 3)\n padding=((2, 3), (2, 3))\n precision=None\n preferred_element_type=None\n rhs_dilation=(1, 1)\n rhs_shape=(7, 7, 3, 64)\n window_strides=(2, 2)\n]” source_file=“/media/node/Materials/anaconda3/envs/xmcgan/lib/python3.9/site-packages/flax/linen/linear.py” source_line=282}, backend_config=“{"algorithm":"0","tensor_ops_enabled":false,"conv_result_scale":1,"activation_mode":"0","side_input_scale":0}” failed. Falling back to default algorithm.

Convolution performance may be suboptimal. To ignore this failure and try to use a fallback algorithm, use XLA_FLAGS=–xla_gpu_strict_conv_algorithm_picker=false. Please also file a bug for the root cause of failing autotuning. `

Issue Analytics

State:
Created 2 years ago
Reactions:12
Comments:12 (2 by maintainers)

Top GitHub Comments

11reactions

half-potatocommented, Mar 16, 2022

Turns out it was an OOM error, just a bad error message. Solution is in #8506. use the environment flag XLA_PYTHON_CLIENT_MEM_FRACTION=0.87. It appears that there is some kind of issue with how jax.scipy.signal.convolve2d handles preallocated memory. I believe it would be nice to have a better error message for this.

7reactions

ross-Hrcommented, Jan 4, 2022

Do you fix the error ?

Top Results From Across the Web

Failed to get convolution algorithm. This is probably because ...

If using Conda environments, in my case the issue was solved by installing tensorflow-gpu and not CUDAtoolkit nor cuDNN because they are already ......

CUDNN ERROR: Failed to get convolution algorithm

I am attempting to install the OS-agnostic version of the most recent NCCL. This is bringing a new error: ldconfig lists NCCL, but...

Failed to get convolution algorithm. This is probably because ...

Everuting runs fine without the GPU accelerator. Tried a lot downloaded some \cudnn-10.0-windows10-x64-v7.3.1.20.zip and did the manual coy past ...

Failed to determine best cudnn convolution ... - Issues Antenna

UNKNOWN: Failed to determine best cudnn convolution algorithm: UNKNOWN: GetConvolveAlgorithms failed. I am trying to run the code locally on a device with...

CUDNN ERROR: Failed to get convolution ... - Newbedev

You have cache issues I regularly work around this error by shutting down. ... I'd go back and set up CUDA + TensorFlow...