Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorRT compatible retinanet

See original GitHub issue

🚀 The feature

The possibility to compile ONNX exported retinanet model with tensorRT.

Motivation, pitch

I’m working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.

Alternatives

No response

Additional context

Actually, I already managed to make it work. I exported the retinanet model to onnx with opset_version=11, then compiled it in tensorRT 8.0.1. To do that I bypassed two preprocessing steps in the GeneralizedRCNNTransform call:

resize, as it contains a Floor operator not compatible with tensorRT

[09/08/2021-13:14:04] [E] [TRT] ModelImporter.cpp:725: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Resize_43
[graph.cpp::computeInputExecutionUses::519] Error Code 9: Internal Error (Floor_30: IUnaryLayer cannot be used to compute a shape tensor)

batch_images, as it contains a Pad operator not compatible with tensorRT

[09/08/2021-13:12:27] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:2984 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights() && "The input pads is required to be an initializer."

I also replaced the type of the two torch.arange, in shifts_x and shifts_y of the AnchorGenerator call, from torch.float32 to torch.int32, as the current version of tensorRT does not support this.

[09/08/2021-14:58:35] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:3170 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"

And finally I bypassed the postprocessing operation of RetinaNet:

postprocess_detections, as it contains a where operation not compatible with tensorRT

[09/08/2021-15:14:12] [I] [TRT] No importer registered for op: NonZero. Attempting to import as plugin.
[09/08/2021-15:14:12] [I] [TRT] Searching for plugin: NonZero, plugin_version: 1, plugin_namespace: 
[09/08/2021-15:14:12] [E] [TRT] 3: getPluginCreator could not find plugin: NonZero version: 1
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:720: While parsing node number 729 [NonZero -> "2086"]:
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:722: input: "2085"
output: "2086"
name: "NonZero_729"
op_type: "NonZero"

[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:4643 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Im my case it is fine to make preprocessing and postprocessing outside of the RetinaNet call. So my request is actually only on the AnchorGenerator, i.e. changing the type of the torch.arange operations from torch.float32 to torch.int32.

cc @datumbox

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

julienripochecommented, Apr 2, 2022

Hi @aurelien-m, of course 😉

Basically what I did is replacing some part of the code by the identity. Here is the code that I used to achieve that.

import torch

# Load retinanet
pth_path = "/path/to/retinanet.pth"
retinanet = torch.load(pth_path, map_location="cpu")
retinanet.eval()

# Image sizes
original_image_size = (677, 511)

# Normalize hack
normalize_tmp = retinanet.transform.normalize
retinanet_normalize = lambda x: normalize_tmp(x)
retinanet.transform.normalize = lambda x: x

# Resize hack
resize_tmp = retinanet.transform.resize
retinanet_resize = lambda x: resize_tmp(x, None)[0]
retinanet.transform.resize = lambda x, y: (x, y)

# Batch images hack
# /!\ torchvision version dependent ???
# retinanet.transform.batch_images = lambda x, size_divisible: x[0].unsqueeze(0)
retinanet.transform.batch_images = lambda x: x[0].unsqueeze(0)

# Generate dummy input
def preprocess_image(img):
    result = retinanet_resize(retinanet_normalize(img)[0]).unsqueeze(0)
    return result
dummy_input = torch.randn(1, 3, original_image_size[0], original_image_size[1])
dummy_input = preprocess_image(dummy_input)
image_size = tuple(dummy_input.shape[2:])
print(dummy_input.shape)

# Postprocess detections hack
postprocess_detections_tmp = retinanet.postprocess_detections
retinanet_postprocess_detections = lambda x: postprocess_detections_tmp(x["split_head_outputs"], x["split_anchors"], [image_size])
retinanet.postprocess_detections = lambda x, y, z: {"split_head_outputs": x, "split_anchors": y}

# Postprocess hack
postprocess_tmp = retinanet.transform.postprocess
retinanet_postprocess = lambda x: postprocess_tmp(x, [image_size], [original_image_size])
retinanet.transform.postprocess = lambda x, y, z: x

# ONNX export
onnx_path = "/path/to/retinanet.onnx"
torch.onnx.export(
    retinanet,
    dummy_input,
    onnx_path,
    verbose=False,
    opset_version=11,
    input_names = ["images"],
)

The resulting ONNX should almost only contain the network itself, plus some anchor treatment. This ONNX should be compilable by tensorRT.

That said, maybe a simpler way to achieve this would have been to simply replace the forward method by a “simpler” one.

About performance gain, I don’t remember exactly. Running some old comparison I can tell you that compiling the model with float16 and adding preprocess and postprocess, the model is around 2 times faster than the original model, i.e. without bypass, exported in ONNX.

Hope it helps 😃

0reactions

aurelien-mcommented, Apr 26, 2022

Is there any info on the improvement of the inference time/latency with exported TensorRT in comparison with ONNX or PyTorch?

On my side, I was able to achieve a x2 to x3 in speeds depending on the hardware, from Pytorch to TensorRT (I don’t have the exact numbers anymore, sorry!)

Top Results From Across the Web

RetinaNet — Transfer Learning Toolkit 3.0 documentation

Two modes are supported, namely TLT model model and TensorRT engine mode. You can execute the TLT model mode using the following command:....

Tutorial 9: ONNX to TensorRT (Experimental)

How to convert models from ONNX to TensorRT. Prerequisite. Usage. How to evaluate the exported models. List of supported models convertible to TensorRT....

PyTorch model to Onnx to TensorRT engine = no speed up for ...

I believe you will run into difficulties for operations that are not supported. Nvidia's retinanet-examples repo gives a working pipeline but ...

TensorRT UFF SSD - JK Jung's blog

I created a TensorRT UFF Single-Shot Multibox Detector (SSD) demo based on ... It appears to me that UFF 0.6.5 is not compatible...

arXiv:2108.08166v1 [cs.CV] 18 Aug 2021

models, we chose 2D object detection using RetinaNet [16] ... the supported functions of TensorRT or TorchScript. When deploying DNN models, ...