question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Setting dynamic batching with warmup

See original GitHub issue

Hi, I’m trying to deploy an MMDetection yolox-s model which I converted to an end2end.engine file using MMDeploy. I added the libmmdeploy_tensorrt_ops.so that was used to generate my engine file into my Triton docker image and use LD_PRELOAD.

I attempted to set my config.pbtxt like this:

name: "yolox"
platform: "tensorrt_plan"
max_batch_size: 8
input {
  name: "input"
  data_type: TYPE_FP32
  dims: [ 3, 800, 1344 ]
}
output [
  {
    name: "dets"
    data_type: TYPE_FP32
    dims: [ 100, 5 ]
  },    
  {
    name: "labels"
    data_type: TYPE_INT32
    dims: [ 100 ]
  }    
]
instance_group {
  count: 1
  kind: KIND_GPU
}

dynamic_batching {
}

model_warmup {
    name: "warmup"
    batch_size: 8
    inputs: {
        key: "input"
        value: {
            data_type: TYPE_FP32
            dims: [ 3, 800, 1344 ]
            zero_data: false
        }
    }
}

default_model_filename: "end2end.engine"

However this gives me warmup errors:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.04 (build 36821869)
Triton Server Version 2.21.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.6 driver version 510.47.03 with kernel driver version 450.142.00.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

WARNING: No SAGEMAKER_TRITON_DEFAULT_MODEL_NAME provided.
         Starting with the only existing model directory yolox
I0513 00:13:21.821294 91 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0513 00:13:21.821390 91 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0513 00:13:21.821411 91 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-05-13 00:13:22.037748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0513 00:13:22.082952 91 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0513 00:13:22.082989 91 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.083006 91 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0513 00:13:22.083026 91 tensorflow.cc:2221] backend configuration:
{}
I0513 00:13:22.084785 91 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0513 00:13:22.084817 91 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.084839 91 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0513 00:13:22.084853 91 onnxruntime.cc:2446] backend configuration:
{}
I0513 00:13:22.106429 91 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0513 00:13:22.106457 91 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.106485 91 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0513 00:13:24.007565 91 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f756c000000' with size 268435456
I0513 00:13:24.008099 91 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0513 00:13:24.010115 91 model_repository_manager.cc:1077] loading: yolox:1
I0513 00:13:24.111009 91 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0513 00:13:24.111048 91 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0513 00:13:24.111070 91 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0513 00:13:24.111160 91 tensorrt.cc:5353] backend configuration:
{}
I0513 00:13:24.111208 91 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: yolox (version 1)
I0513 00:13:24.112760 91 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: yolox_0 (GPU device 0)
I0513 00:13:24.492754 91 logging.cc:49] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 1411, GPU 1013 (MiB)
I0513 00:13:24.528970 91 logging.cc:49] Loaded engine size: 21 MiB
I0513 00:13:25.179791 91 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +385, GPU +178, now: CPU 1854, GPU 1215 (MiB)
I0513 00:13:25.356389 91 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +116, GPU +54, now: CPU 1970, GPU 1269 (MiB)
I0513 00:13:25.357843 91 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +18, now: CPU 0, GPU 18 (MiB)
I0513 00:13:25.359230 91 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1927, GPU 1261 (MiB)
I0513 00:13:25.360743 91 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1927, GPU 1269 (MiB)
I0513 00:13:25.369938 91 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +85, now: CPU 0, GPU 103 (MiB)
I0513 00:13:25.370398 91 tensorrt.cc:1411] Created instance yolox_0 on GPU 0 with stream priority 0 and optimization profile default[0];
E0513 00:13:25.373251 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373287 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373327 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373347 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373376 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373397 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373430 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373443 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373466 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373480 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373494 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373518 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373546 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373562 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373583 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373603 91 tensorrt.cc:1993] error setting the binding dimension
I0513 00:13:25.373817 91 model_repository_manager.cc:1231] successfully loaded 'yolox' version 1
I0513 00:13:25.373923 91 server.cc:549] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0513 00:13:25.374066 91 server.cc:576] 
+-------------+------------------------------------------------------+--------+
| Backend     | Path                                                 | Config |
+-------------+------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch | {}     |
|             | .so                                                  |        |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_ten | {}     |
|             | sorflow1.so                                          |        |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onn | {}     |
|             | xruntime.so                                          |        |
| openvino    | /opt/tritonserver/backends/openvino_2021_4/libtriton | {}     |
|             | _openvino_2021_4.so                                  |        |
| tensorrt    | /opt/tritonserver/backends/tensorrt/libtriton_tensor | {}     |
|             | rt.so                                                |        |
+-------------+------------------------------------------------------+--------+

I0513 00:13:25.374118 91 server.cc:619] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| yolox | 1       | READY  |
+-------+---------+--------+

I0513 00:13:25.374242 91 tritonserver.cc:2123] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.21.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | /opt/ml/model/                           |
| model_control_mode               | MODE_EXPLICIT                            |
| startup_models_0                 | yolox                                    |
| strict_model_config              | 1                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0513 00:13:25.374711 91 sagemaker_server.cc:136] Started Sagemaker HTTPService at 0.0.0.0:8080

Can anyone tell me how I should be setting each dimension to make use of dynamic batching? I’ve attempted several different combinations of values, but can’t seem to get it right.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
austinmwcommented, Jun 2, 2022

I think the MMDeploy docker container I build had some version mismatch, but didn’t fail to build. I don’t think I have the previous dockerfile anymore to check the exact error, but when running trtexec inference I was able to see the error in detail which I couldn’t see with Triton logging.

1reaction
tanmayv25commented, May 18, 2022

This means we would need to look at the shape somehow in the model.

I also tried to use polygraphy inspect, but got an IPluginCreator error, and prepending LD_PRELOAD=/root/workspace/mmdeploy/build/lib/libmmdeploy_mmdet.so didn’t seem to work as it does with tritonserver.

You can try using --plugins, e.g: polygraphy inspect model my_model.engine --plugins /path/to/plugins.so

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unity Draw Call Batching: The Ultimate Guide [2021]
Draw calls are never a problem. That is, until you add one more element and suddenly your render thread becomes your new bottleneck....
Read more >
Speeding up training — ParlAI Documentation
Dynamic batching groups conversations of the same length at the same time, to minimize the amount of unnecessary padding in the tensors. Furthermore,...
Read more >
Automatic Batching - OpenVINO™ Documentation
Auto-batching primarily targets the existing code written for inferencing many requests, each instance with the batch size 1. To obtain corresponding ...
Read more >
Dynamic batching - Unity - Manual
To use dynamic batching for meshes: Go to Edit > Project Settings > Player. In Other Settings, enable Dynamic Batching. Unity automatically batches...
Read more >
Configuring Warmup Requests to Improve Performance
The same content will be available, but the navigation will now match the rest of the Cloud products. If you have feedback or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found