Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Score/confidence of prediction drop a lot after convert to trt engine

See original GitHub issue

I am using Dyhead to train an image detection model: https://github.com/open-mmlab/mmdetection/blob/master/configs/dyhead/atss_swin-l-p4-w12_fpn_dyhead_mstrain_2x_coco.py

Using GPU docker, convert to tensorrt with tools/deploy.py success: python3 tools/deploy.py /workdir/detection_onnx_static_1024x1024.py /workdir/atss_swin-l-p4-w12_fpn_dyhead_mstrain_2x.py /workdir/latest.pth /workdir/test-deploy-img-1024.jpg --device cuda --dump-info

Although the conversion have a lot warning like below:

/root/workspace/mmdeploy/mmdeploy/core/optimizers/function_marker.py:158: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py:24: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  img_shape = [int(val) for val in img_shape]
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/backbones.py:202: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  slice_w = (W + self.window_size - 1) // self.window_size * self.window_size
WARNING: The shape inference of mmdeploy::MMCVModulatedDeformConv2d type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTInstanceNormalization type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
[09/01/2022-01:33:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +348, now: CPU 3460, GPU 785 (MiB)
[09/01/2022-01:33:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +126, GPU +60, now: CPU 3586, GPU 845 (MiB)
[09/01/2022-01:33:56] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.3.2
[09/01/2022-01:33:56] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/01/2022-01:35:06] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[09/01/2022-01:36:46] [TRT] [I] Detected 1 inputs and 2 output network tensors.
[09/01/2022-01:36:48] [TRT] [I] Total Host Persistent Memory: 500480
[09/01/2022-01:36:48] [TRT] [I] Total Device Persistent Memory: 393728
[09/01/2022-01:36:48] [TRT] [I] Total Scratch Memory: 1918222336
[09/01/2022-01:36:48] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 0 MiB
[09/01/2022-01:36:53] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 5282.29ms to assign 35 blocks to 1075 nodes requiring 2039239680 bytes.
[09/01/2022-01:36:53] [TRT] [I] Total Activation Memory: 2039239680
[09/01/2022-01:36:53] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5632, GPU 2311 (MiB)
[09/01/2022-01:36:53] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 5632, GPU 2319 (MiB)
[09/01/2022-01:36:53] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.3.2
[09/01/2022-01:36:53] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[09/01/2022-01:36:53] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[09/01/2022-01:36:53] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2022-09-01 01:36:54,590 - mmdeploy - INFO - Finish pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt
2022-09-01 01:36:55,388 - mmdeploy - WARNING - "visualize_model" has been skipped may be because it's             running on a headless device.
2022-09-01 01:36:55,388 - mmdeploy - INFO - All process success.

I know after conversion the result is not exactly the same so I am okey with bounding box value difference(although its off quite a bit too), but the score is kind of drop too much! Below is the trt engine result (x1,y1,x2,y2,score) [8.8506012, 358.41714, 149.80162, 495.56137, 0.081301391] And below is the original predict with mmdet, the bbox have been round down [0, 328, 165, 526, 0.53286]

Here is my env with python3 tools/check_env.py

2022-09-01 02:08:19,488 - mmdeploy - INFO - 

2022-09-01 02:08:19,489 - mmdeploy - INFO - **********Environmental information**********
2022-09-01 02:08:19,681 - mmdeploy - INFO - sys.platform: linux
2022-09-01 02:08:19,681 - mmdeploy - INFO - Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
2022-09-01 02:08:19,681 - mmdeploy - INFO - CUDA available: True
2022-09-01 02:08:19,681 - mmdeploy - INFO - GPU 0: NVIDIA RTX A4000
2022-09-01 02:08:19,681 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-09-01 02:08:19,681 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
2022-09-01 02:08:19,681 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
2022-09-01 02:08:19,681 - mmdeploy - INFO - PyTorch: 1.12.0
2022-09-01 02:08:19,681 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2022-09-01 02:08:19,681 - mmdeploy - INFO - TorchVision: 0.13.0
2022-09-01 02:08:19,681 - mmdeploy - INFO - OpenCV: 4.6.0
2022-09-01 02:08:19,681 - mmdeploy - INFO - MMCV: 1.6.1
2022-09-01 02:08:19,681 - mmdeploy - INFO - MMCV Compiler: GCC 9.3
2022-09-01 02:08:19,681 - mmdeploy - INFO - MMCV CUDA Compiler: 11.6
2022-09-01 02:08:19,681 - mmdeploy - INFO - MMDeploy: 0.7.0+21775ce
2022-09-01 02:08:19,681 - mmdeploy - INFO - 

2022-09-01 02:08:19,681 - mmdeploy - INFO - **********Backend information**********
2022-09-01 02:08:20,066 - mmdeploy - INFO - onnxruntime: 1.8.1	ops_is_avaliable : True
2022-09-01 02:08:20,091 - mmdeploy - INFO - tensorrt: 8.4.3.1	ops_is_avaliable : True
2022-09-01 02:08:20,105 - mmdeploy - INFO - ncnn: None	ops_is_avaliable : False
2022-09-01 02:08:20,106 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-09-01 02:08:20,107 - mmdeploy - INFO - openvino_is_avaliable: False
2022-09-01 02:08:20,121 - mmdeploy - INFO - snpe_is_available: False
2022-09-01 02:08:20,121 - mmdeploy - INFO - 

2022-09-01 02:08:20,121 - mmdeploy - INFO - **********Codebase information**********
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmdet:	2.25.1
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmseg:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmcls:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmocr:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmedit:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmdet3d:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmpose:	None
2022-09-01 02:08:20,122 - mmdeploy - INFO - mmrotate:	None

And here is my deploy config (/workdir/detection_onnx_static_1024x1024.py):

codebase_config = dict(
    type='mmdet',
    task='ObjectDetection',
    model_type='end2end',
    post_processing=dict(
        score_threshold=0.05,
        confidence_threshold=0.005,  # for YOLOv3
        iou_threshold=0.5,
        max_output_boxes_per_class=200,
        pre_top_k=5000,
        keep_top_k=100,
        background_label_id=-1,
    )
)

onnx_config = dict(
    type='onnx',
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='dyhead_swin_1024.onnx',
    input_names=['input'],
    output_names=['dets', 'labels'],
    input_shape=[1024, 1024],
    optimize=True
)

backend_config = dict(
        type='tensorrt',
        common_config=dict(fp16_mode=False, max_workspace_size=8 << 30),
        model_inputs=[
            dict(
                input_shapes=dict(
                    input=dict(min_shape=[1, 3, 1024, 1024],
                        opt_shape=[1, 3, 1024, 1024],
                        max_shape=[1, 3, 1024, 1024]
                    )
                )
            )
        ]
)

I also try with the default tensorrt version 8.2.x but no success

Could someone please help? Thanks a lot!!

Issue Analytics

State:
Created a year ago
Comments:37

Top GitHub Comments

2reactions

vedrussscommented, Dec 9, 2022

I’ve done that. Just needed to rescale detected boxes to original image shape. Now I can prove: transformed to TRT dyhead model works fine, boxes and scores are same as obtained from original model. There are some small discrepancies for some detections, I believe they are due to small numerical discrepancies appeared during model transformation and this is normal. Thanks to @hanrui1sensetime @grimoire

BTW, currently I’ve tested only FP32 mode. Gonna do same test for FP16.

1reaction

vedrussscommented, Dec 8, 2022

@tak-ho-raspect, @hanrui1sensetime , I’ve found the discrepancy between rewrite_outputs (in test code upper) disappears if here one replaces index from 1 (taking offsets shape as output shape) to 0 (taking input shape as output shape):

ret.d[2] = inputs[0].d[2]; ret.d[3] = inputs[0].d[3];

But after such fix onnx2tensorrt convertion crashes with error about wrong dimensions in graph node

“”" [12/08/2022-01:44:30] [TRT] [I] MatMul_1915: broadcasting input1 to make tensors conform, dims(input0)=[20,144,1536][NONE] dims(input1)=[1,1536,1536][NONE]. [12/08/2022-01:44:30] [TRT] [E] [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IElementWiseLayer Add_2075: broadcast dimensions must be conformable) Traceback (most recent call last): File “tools/onnx2tensorrt.py”, line 73, in <module> main() File “tools/onnx2tensorrt.py”, line 58, in main from_onnx( File “/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/utils.py”, line 165, in from_onnx raise RuntimeError(f’Failed to parse onnx, {error_msgs}') RuntimeError: Failed to parse onnx, In node 2075 (parseGraph): INVALID_NODE: Invalid Node - Add_2075 [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IElementWiseLayer Add_2075: broadcast dimensions must be conformable) “”"