Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors while generating Image with GPU

See original GitHub issue

Errors while trying to generate images with GPU:

XlaRuntimeError                           Traceback (most recent call last)
Input In [13], in <cell line: 9>()
     23 encoded_images = encoded_images.sequences[..., 1:]
     24 # decode images
---> 25 decoded_images = p_decode(encoded_images, vqgan_params)
     26 decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))
     27 for decoded_img in decoded_images:

    [... skipping hidden 15 frame]

File /usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py:713, in backend_compile(backend, built_c, options)
    709 @profiler.annotate_function
    710 def backend_compile(backend, built_c, options):
    711   # we use a separate function call to ensure that XLA compilation appears
    712   # separately in Python profiling results
--> 713   return backend.compile(built_c, compile_options=options)

XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: INTERNAL: All algorithms tried for %cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm.  Per-algorithm errors:
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.

Issue Analytics

State:
Created a year ago
Comments:10

Top GitHub Comments

2reactions

metal3dcommented, Jul 21, 2022

This works:

import os
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="0.8"

0reactions

patsybond172commented, Dec 16, 2022

I too receive this error, indications are the GPU has run out of memory like this: https://patsybond172.github.io/kitchen-cabinets or this https://github.com/patsybond172

Top Results From Across the Web

Using GPU error when use TensorFlow to train image

When I am runing a tensorflow image train job in the container tensorflow/tensorflow:latest-gpu, it doesn't work. Error message:

Error when trying to use GPU processing - Image.sc Forum

From my very limited understanding, I believe this is an error happening when retrieving the file back from the GPU. This error occurs...

Troubleshoot Photoshop graphics processor (GPU) and ...

Updating your graphics driver can fix many issues, such as crashing, incorrectly rendered images, and performance problems.

Visualizing and Communicating Errors in Rendered Images

In rendering research and development, it is important to have a formalized way of visualizing and communicating how and where errors occur ...

Parallelization Techniques for Error Diffusion with GPU ...

Abstract: Error diffusion is a classical but still popular method for generating a binary image that reproduces an original gray-scale image. In error...