question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model with dynamic shapes and TensorRT optimization outputs nonsense

See original GitHub issue

Description We use a detector model (RetinaFace) with dynamic batch as well as dynamic height, width and output size. Originally, it’s a PyTorch model, which we converted to ONNX via torch.onnx.export, where we specify dynamic_axes. Link to the RetinaFace PyTorch implementation.

When we deploy the ONNX model and not use TensorRT optimization option, everything works as expected and output is correctly interpretable for different input image dimensions. E.g. With the input dimension of [1, 3, 1920, 1920] output variable loc would be [1, 151200, 4], which is expected.

However, upon trying to use the TensorRT optimization, we always get the same outputs, which don’t make any sense. E.g. for output variable loc, we would get an output shape of [1,6,4]. The output also always has the same shape, no matter the input dimensions (output shape depends on height and width). Upon checking the tritonserver docker logs, there is no relevant warnings or errors.

Triton Information Triton images from 20.08 to 20.10, they all suffer from the same issue.

Are you using the Triton container or did you build it yourself? Triton container.

To Reproduce Download the ONNX model here - gdrive link

Triton configuration to go with the model:

name: "retina"
platform: "onnxruntime_onnx"
default_model_filename: "retina_dynamic.onnx"
max_batch_size: 2
input {
  name: "input"
  data_type: TYPE_FP32
  dims: 3
  dims: -1
  dims: -1
}
output {
  name: "landms"
  data_type: TYPE_FP32
  dims: -1
  dims: 10
}
output {
  name: "loc"
  data_type: TYPE_FP32
  dims: -1
  dims: 4
}
output {
  name: "conf"
  data_type: TYPE_FP32
  dims: -1
  dims: 2
}
optimization {
  execution_accelerators {gpu_execution_accelerator : [{name : "tensorrt"}]}
}
model_warmup {
  name: "warmup_retina"
  batch_size: 2
  inputs {
    key: "input"
    value {
      data_type: TYPE_FP32
      dims: 3
      dims:  1920
      dims: 1920
      random_data: true
    }
  }
}

Expected behavior When using TensorRT optimization, the output of warmup sample is going to be [1,6,4] for variable loc, which is incorrect. Without optimization, the output of warmup sample is going to be [1,151200,4] for variable loc, which is correct.

Other Some things we tried:

  • variation of optimization parameters (max_cached_engines, minimum_segment_size, max_workspace_size_bytes)
  • variation of torch.onnx.export parameters (opset version, dynamic folding)
  • fixed batch size
  • Triton versions from 20.08 to 20.10. Last one has a different CUDA dependency too.
  • Fixed height and width dimensions, works both with/without tensorrt optimization correctly.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
deadeyegoodwincommented, Jan 26, 2021

We have recently fixed a race condition in triton that may relate to your issue. Can you retry with latest Triton. Closing, reopen when you report your findings.

0reactions
seawater668commented, Dec 17, 2021

@philipp-schmidt @mg515 I also use the above retinaface as a Triton for onnx。

name: “retinaface_model” platform: “onnxruntime_onnx” max_batch_size : 0 input [ { name: “input0” data_type: TYPE_FP32 format: FORMAT_NONE dims: [1,3,640,640] is_shape_tensor: false allow_ragged_batch: false } ] output [ { name: “output0” data_type: TYPE_FP32 dims: [1,16800,16] label_filename: “” is_shape_tensor: false } ]

I don’t know how to use the output to draw the detection box, do you know?

Read more comments on GitHub >

github_iconTop Results From Across the Web

TensorRT dynamic shape err: [slot.h::decode::151] Error Code ...
Description I got this error when I use TensorRT inference Resnet50 with dynamic batch. I set multiple dynamic batch profiles : [[1, 1,...
Read more >
PyTorch 1.10 - Hacker News
It compiles your model, using TensorRT, Ahead of Time and enables you to use the compiled model through torch.jit.load("your_trtorch_model.ts") ...
Read more >
Distributed Training with DTensors | TensorFlow Core
In this tutorial, you will train a Sentiment Analysis model with DTensor. Three distributed training schemes are demonstrated with this example: Data Parallel ......
Read more >
[D] Should We Be Using JAX in 2022? : r/MachineLearning
Do you think higher-order optimization being easier with JAX will be ... If you have a highly dynamic model with changing tensor shapes, ......
Read more >
Diff - platform/external/tensorflow - Google Git
The output gradients are used if not empty and not // null. ... b/tensorflow/compiler/jit/kernels/xla_ops.cc new file mode 100644 index 0000000..c483841 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found