Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to create arcface tensorrt model with tensorrt 20.10

See original GitHub issue

Hi, I was trying to convert Arcface model to TensorRT plan using build_insight_trt.py and TensorRT 20.10, but it failed with the following error-

[TensorRT] ERROR: (Unnamed Layer* 480) [Shuffle]: at most one dimension may be inferred
ERROR: Failed to parse the ONNX file: /models/onnx/arcface_r100_v1/arcface_r100_v1.onnx.tmp
In node -1 (scaleHelper): UNSUPPORTED_NODE: Assertion failed: dims.nbDims == 4 || dims.nbDims == 5

Require this because Triton supports custom Python backends only with Triton 20.10 and not 20.09, and the model converted using TensorRT 20.09 is unsupported with Triton 20.10 (Triton 20.10 supports TensorRT 20.10)

Can this be fixed?

Also, Thanks for the amazing work!

Issue Analytics

State:
Created 3 years ago
Comments:23 (13 by maintainers)

Top GitHub Comments

1reaction

SthPhoenixcommented, Dec 6, 2020

An update: I ran with verbose outputs, apparently the error is in conversion of BatchNorm layer.
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: fc1_beta
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: fc1_moving_mean
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: fc1_moving_var
[TensorRT] VERBOSE: ModelImporter.cpp:125: fc1 [BatchNormalization] inputs: [pre_fc1 -> (-1, -1)], [fc1_gamma -> (512)], [fc1_beta -> (512)], [fc1_moving_mean -> (512)], [fc1_moving_var -> (512)], 
[TensorRT] ERROR: (Unnamed Layer* 480) [Shuffle]: at most one dimension may be inferred
ERROR: Failed to parse the ONNX file: /models/onnx/arcface_r100_v1/arcface_r100_v1.onnx.tmp
In node -1 (scaleHelper): UNSUPPORTED_NODE: Assertion failed: dims.nbDims == 4 || dims.nbDims == 5
Apparently Batchnorm was supporting only 4 or 5 dimension inputs. I found a similar issue the repo of onnx_tensorrt here onnx/onnx-tensorrt#566 And a very recent fix for it with this PR onnx/onnx-tensorrt#569

I’ll try building it from source to check if it works

Looks like TensorRT ONNX parser is linked to latest release of onnx-tensorrt. Fix for this issue is already in master branch of onnx-tensorrt, but it wasn’t released yet, so the issue should be fixed in TensorRT release next to onnx-tensorrt release… Or you can try building onnx-tensorrt from source and use it’s built-in onnx2trt util to build engine.

1reaction

SthPhoenixcommented, Nov 15, 2020

I have tested conversion with batch size>1. Yes, it seems to be broken for TRT 7.2 for now. Workaround with trtexec seems to be working:

trtexec --onnx=/models/onnx/arcface_r100_v1/arcface_r100_v1.onnx --explicitBatch --minShapes=data:1x3x112x112 --optShapes=data:10x3x112x112 --maxShapes=data:10x3x112x112 --saveEngine=/models/trt-engines/arcface_r100_v1/arcface_r100_v1_batched.plan