Unable to autofill for 'yolov4_nvidia', either all model tensor configuration should specify their dims or none
See original GitHub issueDescription
I am trying to convert the pre-trained Pytorch YOLOV4 (darknet) model to TensorRT INT8 with dynamic batching, to later on deploying it on DS-Triton. I am following the general steps in the same NVIDIA-AI-IOT/yolov4_deepstream, but getting issues first with dynamic dimensions at the ONNX-TRT conversion step, then loading the model on DS-Triton :
Environment
TensorRT Version: 7.2.1
NVIDIA GPU: T4
NVIDIA Driver Version: 450.51.06
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System: Ubuntu 18.04
Python Version (if applicable): 1.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable): container image nvcr.io/nvidia/pytorch:20.11-py3
Baremetal or Container (if so, version): container image deepstream:5.1-21.02-triton
Relevant Files
YOLOV4 pre-trained model weights and cfg downloaded from https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
Steps To Reproduce
Complete Pipeline: Pytoch YOLOV4 (darknet) --> ONNX --> TensorRT --> DeepStream-Triton
Step 1: download cfg file and weights from the above link
Step 2: git clone repository pytorch-YOLOv4
$ sudo git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git
Step 3: Convert model YOLOV4 Pytoch --> ONNX | Dynamic Batch size
$ sudo docker run --gpus all -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v l/pytorch-YOLOv4/:/workspace/pytorch-YOLOv4/ nvcr.io/nvidia/pytorch:20.11-py3
$ cd /workspace/pytorch-YOLOv4
$ python demo_darknet2onnx.py "/workspace/pytorch-YOLOv4/models_cfg_weights/yolov4.cfg" "/workspace/pytorch-YOLOv4/models_cfg_weights/yolov4.weights" "/workspace/pytorch-YOLOv4/data/dog.jpg" -1
Result:
Onnx model exporting done
The model expects input shape: ['batch_size', 3, 608, 608]
Saved model: yolov4_-1_3_608_608_dynamic.onnx
Step 4: Convert model ONNX --> TensorRT | Dynamic Batch size
$ sudo docker run --gpus all -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -v /pytorch-YOLOv4/:/workspace/pytorch-YOLOv4/ deepstream:5.1-21.02-triton
$ /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes=\'data\':1x3x608x608 --optShapes=\'data\':2x3x608x608 --maxShapes=\'data\':8x3x608x608 --workspace=4096 --buildOnly -- saveEngine=yolov4_-1_3_608_608_dynamic_onnx_int8.engine --int8
Note: trtexec automatically overrides the engine shape to: 1x3x608x608 instead of keeping the dynamicbatching
[03/09/2021-22:24:24] [W] Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608
[03/09/2021-22:24:24] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[03/09/2021-22:24:25] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[03/09/2021-22:43:52] [I] [TRT] Detected 1 inputs and 8 output network tensors.
$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic_onnx_int8.engine --int8
Result BS=1:
.
[03/09/2021-22:48:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch siz but engine max batch size was: 1
[03/09/2021-22:48:45] [I] Warmup completed 312 queries over 200 ms
[03/09/2021-22:48:45] [I] Timing trace has 4704 queries over 3.0043 s
[03/09/2021-22:48:45] [I] Trace averages of 10 runs:
.
[03/09/2021-22:46:29] [I] Host Latency
[03/09/2021-22:46:29] [I] min: 6.81131 ms (end to end 11.6827 ms)
[03/09/2021-22:46:29] [I] max: 10.3354 ms (end to end 21.7613 ms)
[03/09/2021-22:46:29] [I] mean: 7.02095 ms (end to end 12.1098 ms)
[03/09/2021-22:46:29] [I] median: 7.00833 ms (end to end 12.0729 ms)
[03/09/2021-22:46:29] [I] percentile: 7.2074 ms at 99% (end to end 12.4701 ms at 99%)
[03/09/2021-22:46:29] [I] throughput: 163.949 qps
[03/09/2021-22:46:29] [I] walltime: 3.02533 s
[03/09/2021-22:46:29] [I] Enqueue Time
[03/09/2021-22:46:29] [I] min: 1.49683 ms
[03/09/2021-22:46:29] [I] max: 1.841 ms
[03/09/2021-22:46:29] [I] median: 1.52332 ms
[03/09/2021-22:46:29] [I] GPU Compute
[03/09/2021-22:46:29] [I] min: 5.86343 ms
[03/09/2021-22:46:29] [I] max: 9.38628 ms
[03/09/2021-22:46:29] [I] mean: 6.0721 ms
[03/09/2021-22:46:29] [I] median: 6.05927 ms
[03/09/2021-22:46:29] [I] percentile: 6.25732 ms at 99%
[03/09/2021-22:46:29] [I] total compute time: 3.01176 s
Result BS=2:
Error:
03/09/2021-22:48:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 2, but engine max batch size was: 1
Step 5: Config the DS-Triton files as described in the sample NVIDIA-AI-IOT/yolov4_deepstream
Step 6: Run YOLOV4 INT8 mode with Dynamic shapes with DS-Triton
$ deepstream-app -c deepstream_app_config_yoloV4.txt
Error: “unable to autofill for ‘yolov4_nvidia’, either all model tensor configuration should specify their dims or none”
root@1101333383d9:/workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis# deepstream-app -c source1_primary_yolov4.txt
I0309 23:25:10.628131 260 metrics.cc:219] Collecting metrics for GPU 0: Tesla T4
I0309 23:25:10.634856 260 metrics.cc:219] Collecting metrics for GPU 1: Tesla T4
I0309 23:25:10.641297 260 metrics.cc:219] Collecting metrics for GPU 2: Tesla T4
I0309 23:25:10.647843 260 metrics.cc:219] Collecting metrics for GPU 3: Tesla T4
I0309 23:25:10.706528 260 pinned_memory_manager.cc:199] Pinned memory pool is created at '0x7febf8000000' with size 268435456
I0309 23:25:10.710959 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 0 with size 67108864
I0309 23:25:10.710967 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 1 with size 67108864
I```
0309 23:25:10.710972 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 2 with size 67108864
I0309 23:25:10.710976 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 3 with size 67108864
I0309 23:25:10.991848 260 server.cc:141]
.
| Backend | Config | Path |
.
.
I0309 23:25:10.991880 260 server.cc:184]
.
| Model | Version | Status |
.
.
I0309 23:25:10.991971 260 tritonserver.cc:1620]
.
| Option | Value |
.
| server_id | triton |
| server_version | 2.5.0 |
| server_extensions | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
| | or_data statistics |
| model_repository_path[0] | /workspace/Deepstream_5.1_Triton/samples/trtis_model_repo |
| model_control_mode | MODE_EXPLICIT |
| strict_model_config | 0 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
.
E0309 23:25:22.300254 260 model_repository_manager.cc:1705] unable to autofill for 'yolov4_nvidia', either all model tensor configuration should specify their dims or none.
ERROR: infer_trtis_server.cpp:1044 Triton: failed to load model yolov4_nvidia, triton_err_str:Internal, err_msg:failed to load 'yolov4_nvidia', no version is available
ERROR: infer_trtis_backend.cpp:45 failed to load model: yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
ERROR: infer_trtis_backend.cpp:184 failed to initialize backend while ensuring model:yolov4_nvidia ready, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.399726167 260 0x564fdec902f0 ERROR nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in createNNBackend() <infer_trtis_context.cpp:246> [UID = 1]: failed to initialize trtis backend for model:yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
I0309 23:25:22.300489 260 server.cc:280] Waiting for in-flight requests to complete.
I0309 23:25:22.300497 260 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
0:00:14.399831360 260 0x564fdec902f0 ERROR nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:81> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.399843072 260 0x564fdec902f0 WARN nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Failed to initialize InferTrtIsContext
0:00:14.399868241 260 0x564fdec902f0 WARN nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
0:00:14.400284532 260 0x564fdec902f0 WARN nvinferserver gstnvinferserver.cpp:460:gst_nvinfer_server_start:<primary_gie> error: gstnvinferserver_impl start failed
** ERROR: <main:655>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to initialize InferTrtIsContext
Debug info: gstnvinferserver_impl.cpp(439): start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie:
Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
ERROR from primary_gie: gstnvinferserver_impl start failed
Debug info: gstnvinferserver.cpp(460): gst_nvinfer_server_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
App run failed
I think the problem is with trtexec
. is there a sample/tool that shows how to optimize a YOLO Pytorch-ONNX to TensorRT engine INT8 mode with full INT8 calibration and dynamic input shapes?
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (5 by maintainers)
Top GitHub Comments
Hi @deadeyegoodwin, thanks for your prompt support. I have listed all the input/output tensors and the model was loaded successfully. Now when I deploy the TensorRT engine in INT8 mode with dynamic batch with DS-Triton integration I am facing some performance issues please see below:
Performance with:
For some reason the batch size is reset to 1:
WARNING from primary_gie: Configuration file batch-size reset to: 1
Convert ONNX --> TRT INT8 with dynamic Batch size
/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes=\'input\':1x3x608x608 --optShapes=\'input\':2x3x608x608 --maxShapes=\'input\':8x3x608x608 --workspace=4096 --buildOnly --saveEngine=yolov4_-1_3_608_608_dynamic_onnx_int8_trtexec_3.engine --int8
Test: BS=1 | count:1 | PERF: 228.29 (228.06):
Test: BS=1 | count:8 | PERF: 229.84 (229.77):
Test: - BS=2 | count:1 | Bach size error:
Some suggestions or recommendations to debug the performance issues?, I want to take advantage of concurrent and dynamic batching features to boost performance
@vilmara I see the deepstream/triton pipeline ran with the yolov4 model, did it produce correct outputs in the output video? I’m getting rectangles in random places in the video where it was running correctly without triton