question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorRT compatible retinanet

See original GitHub issue

🚀 The feature

The possibility to compile ONNX exported retinanet model with tensorRT.

Motivation, pitch

I’m working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.

Alternatives

No response

Additional context

Actually, I already managed to make it work. I exported the retinanet model to onnx with opset_version=11, then compiled it in tensorRT 8.0.1. To do that I bypassed two preprocessing steps in the GeneralizedRCNNTransform call:

  • resize, as it contains a Floor operator not compatible with tensorRT
[09/08/2021-13:14:04] [E] [TRT] ModelImporter.cpp:725: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Resize_43
[graph.cpp::computeInputExecutionUses::519] Error Code 9: Internal Error (Floor_30: IUnaryLayer cannot be used to compute a shape tensor)
  • batch_images, as it contains a Pad operator not compatible with tensorRT
[09/08/2021-13:12:27] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:2984 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights() && "The input pads is required to be an initializer."

I also replaced the type of the two torch.arange, in shifts_x and shifts_y of the AnchorGenerator call, from torch.float32 to torch.int32, as the current version of tensorRT does not support this.

[09/08/2021-14:58:35] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:3170 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"

And finally I bypassed the postprocessing operation of RetinaNet:

[09/08/2021-15:14:12] [I] [TRT] No importer registered for op: NonZero. Attempting to import as plugin.
[09/08/2021-15:14:12] [I] [TRT] Searching for plugin: NonZero, plugin_version: 1, plugin_namespace: 
[09/08/2021-15:14:12] [E] [TRT] 3: getPluginCreator could not find plugin: NonZero version: 1
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:720: While parsing node number 729 [NonZero -> "2086"]:
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:722: input: "2085"
output: "2086"
name: "NonZero_729"
op_type: "NonZero"

[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:4643 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Im my case it is fine to make preprocessing and postprocessing outside of the RetinaNet call. So my request is actually only on the AnchorGenerator, i.e. changing the type of the torch.arange operations from torch.float32 to torch.int32.

cc @datumbox

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
julienripochecommented, Apr 2, 2022

Hi @aurelien-m, of course 😉

Basically what I did is replacing some part of the code by the identity. Here is the code that I used to achieve that.

import torch

# Load retinanet
pth_path = "/path/to/retinanet.pth"
retinanet = torch.load(pth_path, map_location="cpu")
retinanet.eval()

# Image sizes
original_image_size = (677, 511)

# Normalize hack
normalize_tmp = retinanet.transform.normalize
retinanet_normalize = lambda x: normalize_tmp(x)
retinanet.transform.normalize = lambda x: x

# Resize hack
resize_tmp = retinanet.transform.resize
retinanet_resize = lambda x: resize_tmp(x, None)[0]
retinanet.transform.resize = lambda x, y: (x, y)

# Batch images hack
# /!\ torchvision version dependent ???
# retinanet.transform.batch_images = lambda x, size_divisible: x[0].unsqueeze(0)
retinanet.transform.batch_images = lambda x: x[0].unsqueeze(0)

# Generate dummy input
def preprocess_image(img):
    result = retinanet_resize(retinanet_normalize(img)[0]).unsqueeze(0)
    return result
dummy_input = torch.randn(1, 3, original_image_size[0], original_image_size[1])
dummy_input = preprocess_image(dummy_input)
image_size = tuple(dummy_input.shape[2:])
print(dummy_input.shape)

# Postprocess detections hack
postprocess_detections_tmp = retinanet.postprocess_detections
retinanet_postprocess_detections = lambda x: postprocess_detections_tmp(x["split_head_outputs"], x["split_anchors"], [image_size])
retinanet.postprocess_detections = lambda x, y, z: {"split_head_outputs": x, "split_anchors": y}

# Postprocess hack
postprocess_tmp = retinanet.transform.postprocess
retinanet_postprocess = lambda x: postprocess_tmp(x, [image_size], [original_image_size])
retinanet.transform.postprocess = lambda x, y, z: x

# ONNX export
onnx_path = "/path/to/retinanet.onnx"
torch.onnx.export(
    retinanet,
    dummy_input,
    onnx_path,
    verbose=False,
    opset_version=11,
    input_names = ["images"],
)

The resulting ONNX should almost only contain the network itself, plus some anchor treatment. This ONNX should be compilable by tensorRT.

That said, maybe a simpler way to achieve this would have been to simply replace the forward method by a “simpler” one.

About performance gain, I don’t remember exactly. Running some old comparison I can tell you that compiling the model with float16 and adding preprocess and postprocess, the model is around 2 times faster than the original model, i.e. without bypass, exported in ONNX.

Hope it helps 😃

0reactions
aurelien-mcommented, Apr 26, 2022

Is there any info on the improvement of the inference time/latency with exported TensorRT in comparison with ONNX or PyTorch?

On my side, I was able to achieve a x2 to x3 in speeds depending on the hardware, from Pytorch to TensorRT (I don’t have the exact numbers anymore, sorry!)

Read more comments on GitHub >

github_iconTop Results From Across the Web

RetinaNet — Transfer Learning Toolkit 3.0 documentation
Two modes are supported, namely TLT model model and TensorRT engine mode. You can execute the TLT model mode using the following command:....
Read more >
Tutorial 9: ONNX to TensorRT (Experimental)
How to convert models from ONNX to TensorRT. Prerequisite. Usage. How to evaluate the exported models. List of supported models convertible to TensorRT....
Read more >
PyTorch model to Onnx to TensorRT engine = no speed up for ...
I believe you will run into difficulties for operations that are not supported. Nvidia's retinanet-examples repo gives a working pipeline but ...
Read more >
TensorRT UFF SSD - JK Jung's blog
I created a TensorRT UFF Single-Shot Multibox Detector (SSD) demo based on ... It appears to me that UFF 0.6.5 is not compatible...
Read more >
arXiv:2108.08166v1 [cs.CV] 18 Aug 2021
models, we chose 2D object detection using RetinaNet [16] ... the supported functions of TensorRT or TorchScript. When deploying DNN models, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found