question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors while generating Image with GPU

See original GitHub issue

Errors while trying to generate images with GPU:

XlaRuntimeError                           Traceback (most recent call last)
Input In [13], in <cell line: 9>()
     23 encoded_images = encoded_images.sequences[..., 1:]
     24 # decode images
---> 25 decoded_images = p_decode(encoded_images, vqgan_params)
     26 decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))
     27 for decoded_img in decoded_images:

    [... skipping hidden 15 frame]

File /usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py:713, in backend_compile(backend, built_c, options)
    709 @profiler.annotate_function
    710 def backend_compile(backend, built_c, options):
    711   # we use a separate function call to ensure that XLA compilation appears
    712   # separately in Python profiling results
--> 713   return backend.compile(built_c, compile_options=options)

XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: INTERNAL: All algorithms tried for %cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm.  Per-algorithm errors:
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10

github_iconTop GitHub Comments

2reactions
metal3dcommented, Jul 21, 2022

This works:

import os
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="0.8"
0reactions
patsybond172commented, Dec 16, 2022

I too receive this error, indications are the GPU has run out of memory like this: https://patsybond172.github.io/kitchen-cabinets or this https://github.com/patsybond172

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using GPU error when use TensorFlow to train image
When I am runing a tensorflow image train job in the container tensorflow/tensorflow:latest-gpu, it doesn't work. Error message:
Read more >
Error when trying to use GPU processing - Image.sc Forum
From my very limited understanding, I believe this is an error happening when retrieving the file back from the GPU. This error occurs...
Read more >
Troubleshoot Photoshop graphics processor (GPU) and ...
Updating your graphics driver can fix many issues, such as crashing, incorrectly rendered images, and performance problems.
Read more >
Visualizing and Communicating Errors in Rendered Images
In rendering research and development, it is important to have a formalized way of visualizing and communicating how and where errors occur ...
Read more >
Parallelization Techniques for Error Diffusion with GPU ...
Abstract: Error diffusion is a classical but still popular method for generating a binary image that reproduces an original gray-scale image. In error...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found