Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model with dynamic shapes and TensorRT optimization outputs nonsense

See original GitHub issue

Description We use a detector model (RetinaFace) with dynamic batch as well as dynamic height, width and output size. Originally, it’s a PyTorch model, which we converted to ONNX via torch.onnx.export, where we specify dynamic_axes. Link to the RetinaFace PyTorch implementation.

When we deploy the ONNX model and not use TensorRT optimization option, everything works as expected and output is correctly interpretable for different input image dimensions. E.g. With the input dimension of [1, 3, 1920, 1920] output variable loc would be [1, 151200, 4], which is expected.

However, upon trying to use the TensorRT optimization, we always get the same outputs, which don’t make any sense. E.g. for output variable loc, we would get an output shape of [1,6,4]. The output also always has the same shape, no matter the input dimensions (output shape depends on height and width). Upon checking the tritonserver docker logs, there is no relevant warnings or errors.

Triton Information Triton images from 20.08 to 20.10, they all suffer from the same issue.

Are you using the Triton container or did you build it yourself? Triton container.

To Reproduce Download the ONNX model here - gdrive link

Triton configuration to go with the model:

name: "retina"
platform: "onnxruntime_onnx"
default_model_filename: "retina_dynamic.onnx"
max_batch_size: 2
input {
  name: "input"
  data_type: TYPE_FP32
  dims: 3
  dims: -1
  dims: -1
}
output {
  name: "landms"
  data_type: TYPE_FP32
  dims: -1
  dims: 10
}
output {
  name: "loc"
  data_type: TYPE_FP32
  dims: -1
  dims: 4
}
output {
  name: "conf"
  data_type: TYPE_FP32
  dims: -1
  dims: 2
}
optimization {
  execution_accelerators {gpu_execution_accelerator : [{name : "tensorrt"}]}
}
model_warmup {
  name: "warmup_retina"
  batch_size: 2
  inputs {
    key: "input"
    value {
      data_type: TYPE_FP32
      dims: 3
      dims:  1920
      dims: 1920
      random_data: true
    }
  }
}

Expected behavior When using TensorRT optimization, the output of warmup sample is going to be [1,6,4] for variable loc, which is incorrect. Without optimization, the output of warmup sample is going to be [1,151200,4] for variable loc, which is correct.

Other Some things we tried:

variation of optimization parameters (max_cached_engines, minimum_segment_size, max_workspace_size_bytes)
variation of torch.onnx.export parameters (opset version, dynamic folding)
fixed batch size
Triton versions from 20.08 to 20.10. Last one has a different CUDA dependency too.
Fixed height and width dimensions, works both with/without tensorrt optimization correctly.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (1 by maintainers)

Top GitHub Comments

4reactions

deadeyegoodwincommented, Jan 26, 2021

We have recently fixed a race condition in triton that may relate to your issue. Can you retry with latest Triton. Closing, reopen when you report your findings.

0reactions

seawater668commented, Dec 17, 2021

@philipp-schmidt @mg515 I also use the above retinaface as a Triton for onnx。

name: “retinaface_model” platform: “onnxruntime_onnx” max_batch_size : 0 input [ { name: “input0” data_type: TYPE_FP32 format: FORMAT_NONE dims: [1,3,640,640] is_shape_tensor: false allow_ragged_batch: false } ] output [ { name: “output0” data_type: TYPE_FP32 dims: [1,16800,16] label_filename: “” is_shape_tensor: false } ]

I don’t know how to use the output to draw the detection box, do you know？

Top Results From Across the Web

TensorRT dynamic shape err: [slot.h::decode::151] Error Code ...

Description I got this error when I use TensorRT inference Resnet50 with dynamic batch. I set multiple dynamic batch profiles : [[1, 1,...

PyTorch 1.10 - Hacker News

It compiles your model, using TensorRT, Ahead of Time and enables you to use the compiled model through torch.jit.load("your_trtorch_model.ts") ...

Distributed Training with DTensors | TensorFlow Core

In this tutorial, you will train a Sentiment Analysis model with DTensor. Three distributed training schemes are demonstrated with this example: Data Parallel ......

[D] Should We Be Using JAX in 2022? : r/MachineLearning

Do you think higher-order optimization being easier with JAX will be ... If you have a highly dynamic model with changing tensor shapes, ......

Diff - platform/external/tensorflow - Google Git

The output gradients are used if not empty and not // null. ... b/tensorflow/compiler/jit/kernels/xla_ops.cc new file mode 100644 index 0000000..c483841 ...