Model with dynamic shapes and TensorRT optimization outputs nonsense
See original GitHub issueDescription
We use a detector model (RetinaFace) with dynamic batch as well as dynamic height, width and output size. Originally, it’s a PyTorch model, which we converted to ONNX via torch.onnx.export
, where we specify dynamic_axes
.
Link to the RetinaFace PyTorch implementation.
When we deploy the ONNX model and not use TensorRT optimization option, everything works as expected and output is correctly interpretable for different input image dimensions. E.g. With the input dimension of [1, 3, 1920, 1920]
output variable loc
would be [1, 151200, 4]
, which is expected.
However, upon trying to use the TensorRT optimization, we always get the same outputs, which don’t make any sense. E.g. for output variable loc
, we would get an output shape of [1,6,4]
. The output also always has the same shape, no matter the input dimensions (output shape depends on height and width). Upon checking the tritonserver docker logs, there is no relevant warnings or errors.
Triton Information Triton images from 20.08 to 20.10, they all suffer from the same issue.
Are you using the Triton container or did you build it yourself? Triton container.
To Reproduce Download the ONNX model here - gdrive link
Triton configuration to go with the model:
name: "retina"
platform: "onnxruntime_onnx"
default_model_filename: "retina_dynamic.onnx"
max_batch_size: 2
input {
name: "input"
data_type: TYPE_FP32
dims: 3
dims: -1
dims: -1
}
output {
name: "landms"
data_type: TYPE_FP32
dims: -1
dims: 10
}
output {
name: "loc"
data_type: TYPE_FP32
dims: -1
dims: 4
}
output {
name: "conf"
data_type: TYPE_FP32
dims: -1
dims: 2
}
optimization {
execution_accelerators {gpu_execution_accelerator : [{name : "tensorrt"}]}
}
model_warmup {
name: "warmup_retina"
batch_size: 2
inputs {
key: "input"
value {
data_type: TYPE_FP32
dims: 3
dims: 1920
dims: 1920
random_data: true
}
}
}
Expected behavior
When using TensorRT optimization, the output of warmup sample is going to be [1,6,4]
for variable loc
, which is incorrect.
Without optimization, the output of warmup sample is going to be [1,151200,4]
for variable loc
, which is correct.
Other Some things we tried:
- variation of optimization parameters (
max_cached_engines
,minimum_segment_size
,max_workspace_size_bytes
) - variation of
torch.onnx.export
parameters (opset version
,dynamic folding
) - fixed batch size
- Triton versions from
20.08
to20.10
. Last one has a different CUDA dependency too. - Fixed height and width dimensions, works both with/without tensorrt optimization correctly.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (1 by maintainers)
Top GitHub Comments
We have recently fixed a race condition in triton that may relate to your issue. Can you retry with latest Triton. Closing, reopen when you report your findings.
@philipp-schmidt @mg515 I also use the above retinaface as a Triton for onnx。
name: “retinaface_model” platform: “onnxruntime_onnx” max_batch_size : 0 input [ { name: “input0” data_type: TYPE_FP32 format: FORMAT_NONE dims: [1,3,640,640] is_shape_tensor: false allow_ragged_batch: false } ] output [ { name: “output0” data_type: TYPE_FP32 dims: [1,16800,16] label_filename: “” is_shape_tensor: false } ]
I don’t know how to use the output to draw the detection box, do you know?