question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] The pointPillars model got wrong output when I use TensorRT acceleration

See original GitHub issue

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

Hi, I used mmdeploy/tools/test.py to test the converted pointpillars onnx model, it successfully finished but the AP result is abnormal:

----------- AP11 Results ------------

Pedestrian AP11@0.50, 0.50, 0.50: bbox AP11:0.0000, 0.0000, 0.0000 bev AP11:0.0000, 0.0000, 0.0000 3d AP11:0.0000, 0.0000, 0.0000 aos AP11:0.00, 0.00, 0.00 Pedestrian AP11@0.50, 0.25, 0.25: bbox AP11:0.0000, 0.0000, 0.0000 bev AP11:0.0000, 0.0000, 0.0000 3d AP11:0.0000, 0.0000, 0.0000 aos AP11:0.00, 0.00, 0.00 Cyclist AP11@0.50, 0.50, 0.50: bbox AP11:0.0000, 0.0000, 0.0000 bev AP11:0.0000, 0.0000, 0.0000 3d AP11:0.0000, 0.0000, 0.0000 aos AP11:0.00, 0.00, 0.00 Cyclist AP11@0.50, 0.25, 0.25: bbox AP11:0.0000, 0.0000, 0.0000 bev AP11:0.0000, 0.0000, 0.0000 3d AP11:0.0000, 0.0000, 0.0000 aos AP11:0.00, 0.00, 0.00 Car AP11@0.70, 0.70, 0.70: bbox AP11:0.0000, 9.0909, 9.0909 bev AP11:0.0000, 9.0909, 9.0909 3d AP11:0.0000, 9.0909, 9.0909 aos AP11:0.00, 9.09, 9.09 Car AP11@0.70, 0.50, 0.50: bbox AP11:0.0000, 9.0909, 9.0909 bev AP11:0.0000, 9.0909, 9.0909 3d AP11:0.0000, 9.0909, 9.0909 aos AP11:0.00, 9.09, 9.09

Overall AP11@easy, moderate, hard: bbox AP11:0.0000, 3.0303, 3.0303 bev AP11:0.0000, 3.0303, 3.0303 3d AP11:0.0000, 3.0303, 3.0303 aos AP11:0.00, 3.03, 3.03

----------- AP40 Results ------------

Pedestrian AP40@0.50, 0.50, 0.50: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:0.0000, 0.0000, 0.0000 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Pedestrian AP40@0.50, 0.25, 0.25: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:0.0000, 0.0000, 0.0000 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Cyclist AP40@0.50, 0.50, 0.50: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:0.0000, 0.0000, 0.0000 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Cyclist AP40@0.50, 0.25, 0.25: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:0.0000, 0.0000, 0.0000 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Car AP40@0.70, 0.70, 0.70: bbox AP40:0.0000, 2.5000, 2.5000 bev AP40:0.0000, 2.5000, 2.5000 3d AP40:0.0000, 2.5000, 2.5000 aos AP40:0.00, 2.50, 2.50 Car AP40@0.70, 0.50, 0.50: bbox AP40:0.0000, 2.5000, 2.5000 bev AP40:0.0000, 2.5000, 2.5000 3d AP40:0.0000, 2.5000, 2.5000 aos AP40:0.00, 2.50, 2.50

Overall AP40@easy, moderate, hard: bbox AP40:0.0000, 0.8333, 0.8333 bev AP40:0.0000, 0.8333, 0.8333 3d AP40:0.0000, 0.8333, 0.8333 aos AP40:0.00, 0.83, 0.83

Then I checked the data_loader, the inputs seems correct,and I have checked the onnx model, it seems ok and visualization by netron is almost same as the pointpillars onnx file from this link(which I found from this issue,https://github.com/NVIDIA/TensorRT/issues/2338): https://drive.google.com/file/d/1FuZJWLIsJyUsUk_lM1euXzyPgagu-tXj/view?usp=sharing I have tried the both onnx file, converted into .engine file and test, but got the same results.

So I print the outputs then found that, outputs = task_processor.single_gpu_test(model, data_loader, args.show, args.show_dir) returned the empty outputs such as:

{‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, {‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, {‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, {‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, {‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, {‘boxes_3d’: LiDARInstance3DBoxes( tensor([], size=(0, 7))), ‘scores_3d’: tensor([]), ‘labels_3d’: tensor([], dtype=torch.int64)}, … …

So, I would like to ask, what may be the cause of the the wrong test results? and how to solve it? Thanks very much!

Reproduction

I use this command to convert model: python mmdeploy/tools/deploy.py mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-kitti-32x4.py mmdetection3d/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py checkpoints/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class_20220301_150306-37dc2420.pth mmdetection3d/demo/data/kitti/kitti_000008.bin --work-dir work-dir2 --device cuda:0 --show

I use this command to test the converted model: python …/mmdeploy/tools/test.py …/mmdeploy/configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-kitti-32x4.py ./configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py --model …/mmdeploy/work-dir2/end2end.engine --metrics bbox --device cuda:0

Environment

2022-12-12 18:08:33,002 - mmdeploy - INFO - 

2022-12-12 18:08:33,002 - mmdeploy - INFO - **********Environmental information**********
2022-12-12 18:08:33,416 - mmdeploy - INFO - sys.platform: linux
2022-12-12 18:08:33,416 - mmdeploy - INFO - Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
2022-12-12 18:08:33,416 - mmdeploy - INFO - CUDA available: True
2022-12-12 18:08:33,416 - mmdeploy - INFO - GPU 0,1,2,3,4,5,6,7,8,9: NVIDIA GeForce RTX 3090
2022-12-12 18:08:33,416 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-12-12 18:08:33,416 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.109
2022-12-12 18:08:33,416 - mmdeploy - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
2022-12-12 18:08:33,416 - mmdeploy - INFO - PyTorch: 1.11.0
2022-12-12 18:08:33,416 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2022-12-12 18:08:33,416 - mmdeploy - INFO - TorchVision: 0.12.0
2022-12-12 18:08:33,416 - mmdeploy - INFO - OpenCV: 4.5.5
2022-12-12 18:08:33,416 - mmdeploy - INFO - MMCV: 1.5.2
2022-12-12 18:08:33,416 - mmdeploy - INFO - MMCV Compiler: GCC 7.5
2022-12-12 18:08:33,416 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2022-12-12 18:08:33,416 - mmdeploy - INFO - MMDeploy: 0.10.0+99040d5
2022-12-12 18:08:33,416 - mmdeploy - INFO - 

2022-12-12 18:08:33,417 - mmdeploy - INFO - **********Backend information**********
2022-12-12 18:08:34,080 - mmdeploy - INFO - onnxruntime: None	ops_is_avaliable : False
2022-12-12 18:08:34,108 - mmdeploy - INFO - tensorrt: 8.5.1.7	ops_is_avaliable : True
2022-12-12 18:08:34,124 - mmdeploy - INFO - ncnn: None	ops_is_avaliable : False
2022-12-12 18:08:34,125 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-12-12 18:08:34,126 - mmdeploy - INFO - openvino_is_avaliable: False
2022-12-12 18:08:34,142 - mmdeploy - INFO - snpe_is_available: False
2022-12-12 18:08:34,143 - mmdeploy - INFO - ascend_is_available: False
2022-12-12 18:08:34,144 - mmdeploy - INFO - coreml_is_available: False
2022-12-12 18:08:34,144 - mmdeploy - INFO - 

2022-12-12 18:08:34,144 - mmdeploy - INFO - **********Codebase information**********
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmdet:	2.24.1
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmseg:	0.24.1
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmcls:	0.23.0
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmocr:	None
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmedit:	None
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmdet3d:	1.0.0rc4
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmpose:	None
2022-12-12 18:08:34,146 - mmdeploy - INFO - mmrotate:	None

Error traceback

Nothing

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
ykqyzzscommented, Dec 19, 2022

I tested again on another server with CUDA10.2 + TensorRT8.4.3.1. The problem was gone, I got the correct results. thanks alot. but I still want to know, what’s the wrong reason on my envs in this issue…

0reactions
tpoisonooocommented, Dec 20, 2022

I have no TRT source code, just some assumptions.

  1. Open your .onnx file, you can get a Scatter or ScatterND operator
  2. TRT has some bug when producing this operator with dynamic input shape
Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · open-mmlab/mmdeploy - GitHub
[Questionnaire] Which model do you want to support in mmdeploy? ... [Bug] The pointPillars model got wrong output when I use Tens.
Read more >
Speeding Up Deep Learning Inference Using TensorRT
Run the sample application with the trained model and input data passed as inputs. The data is provided as an ONNX protobuf file....
Read more >
How to Convert a Model from PyTorch to TensorRT and ...
We will use the following steps. Train a model using PyTorch; Convert the model to ONNX format; Use NVIDIA TensorRT for inference. In...
Read more >
Lidar 3-D Object Detection Using PointPillars Deep Learning
These sensors capture 3-D position information about objects in a scene, which is useful for many applications in autonomous driving and augmented reality....
Read more >
Torch-TensorRT Getting Started - ResNet 50 - PyTorch
When deploying on NVIDIA GPUs TensorRT, NVIDIA's Deep Learning Optimization SDK and Runtime is able to take models from any major framework and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found