TensorRT compatible retinanet
See original GitHub issue🚀 The feature
The possibility to compile ONNX exported retinanet model with tensorRT.
Motivation, pitch
I’m working with the torchvision retinanet implementation and have some production constraints regarding inference time. I think it would be great if the ONNX export of retinanet could be further compiled in tensorRT.
Alternatives
No response
Additional context
Actually, I already managed to make it work.
I exported the retinanet model to onnx with opset_version=11
, then compiled it in tensorRT 8.0.1
.
To do that I bypassed two preprocessing steps in the GeneralizedRCNNTransform
call:
- resize, as it contains a Floor operator not compatible with tensorRT
[09/08/2021-13:14:04] [E] [TRT] ModelImporter.cpp:725: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Resize_43
[graph.cpp::computeInputExecutionUses::519] Error Code 9: Internal Error (Floor_30: IUnaryLayer cannot be used to compute a shape tensor)
- batch_images, as it contains a Pad operator not compatible with tensorRT
[09/08/2021-13:12:27] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:2984 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights() && "The input pads is required to be an initializer."
I also replaced the type of the two torch.arange
, in shifts_x and shifts_y of the AnchorGenerator
call, from torch.float32
to torch.int32
, as the current version of tensorRT does not support this.
[09/08/2021-14:58:35] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:3170 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
And finally I bypassed the postprocessing operation of RetinaNet
:
- postprocess_detections, as it contains a where operation not compatible with tensorRT
[09/08/2021-15:14:12] [I] [TRT] No importer registered for op: NonZero. Attempting to import as plugin.
[09/08/2021-15:14:12] [I] [TRT] Searching for plugin: NonZero, plugin_version: 1, plugin_namespace:
[09/08/2021-15:14:12] [E] [TRT] 3: getPluginCreator could not find plugin: NonZero version: 1
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:720: While parsing node number 729 [NonZero -> "2086"]:
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:722: input: "2085"
output: "2086"
name: "NonZero_729"
op_type: "NonZero"
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[09/08/2021-15:14:12] [E] [TRT] ModelImporter.cpp:725: ERROR: builtin_op_importers.cpp:4643 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Im my case it is fine to make preprocessing and postprocessing outside of the RetinaNet
call.
So my request is actually only on the AnchorGenerator
, i.e. changing the type of the torch.arange
operations from torch.float32
to torch.int32
.
cc @datumbox
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (4 by maintainers)
Top GitHub Comments
Hi @aurelien-m, of course 😉
Basically what I did is replacing some part of the code by the identity. Here is the code that I used to achieve that.
The resulting ONNX should almost only contain the network itself, plus some anchor treatment. This ONNX should be compilable by tensorRT.
That said, maybe a simpler way to achieve this would have been to simply replace the forward method by a “simpler” one.
About performance gain, I don’t remember exactly. Running some old comparison I can tell you that compiling the model with float16 and adding preprocess and postprocess, the model is around 2 times faster than the original model, i.e. without bypass, exported in ONNX.
Hope it helps 😃
On my side, I was able to achieve a x2 to x3 in speeds depending on the hardware, from Pytorch to TensorRT (I don’t have the exact numbers anymore, sorry!)