Failed to set cuda graph shape when I set max_batch_size==0
See original GitHub issueDescription cuda graph failed when I set max_batch_size==0
Triton Information What version of Triton are you using? 22.04
Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:22.04-py3
To Reproduce model I used pytorch ResNet18 pretrained model ,and converted to onnx model
import torch
from torch import nn
import torchvision
import argparse
import torchvision.models as models
parser = argparse.ArgumentParser()
parser.add_argument("--output_model", type=str, required=True, help="model output path")
def main():
args = parser.parse_args()
output_model_path = args.output_model
model = models.resnet18()
model = model.to('cuda:0')
model.eval()
x = torch.ones(1, 3, 224, 224).to('cuda:0')
torch.onnx.export(
model=model,
args=x,
f=output_model_path,
opset_version=11,
export_params=True,
do_constant_folding=True,
input_names = ['INPUT__0'],
output_names = ['OUTPUT__0'],
dynamic_axes={'INPUT__0' : {0:'bs'}, 'OUTPUT__0' : {0:'bs'}}
)
if __name__ == '__main__':
main()
then I converted it to tensorrt plan file
trtexec --onnx=resnet18.onnx --explicitBatch --optShapes=INPUT__0:5x3x224x224 --buildOnly --saveEngine=resnet18.plan --workspace=12288 --device=1
tritonserver config my tritonserver config
platform: "tensorrt_plan"
max_batch_size : 0
input: [
{
name: "INPUT__0",
data_type: TYPE_FP32,
dims: [5, 3, 224, 224],
}
],
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0]
}
]
optimization{
graph: {
level : 1
},
eager_batching : 1,
cuda: {
graphs:1,
graph_spec: [
{
input: {
key: "INPUT__0",
value: {dim:[5, 3, 224, 224]}
}
}
],
busy_wait_events:1,
output_copy_stream: 1
}
}
result when I run tritonserver,I get following error:
I0823 14:33:25.438942 8964 tensorrt.cc:3193] Detected INPUT__0 as execution binding for resnet_0
I0823 14:33:25.438952 8964 tensorrt.cc:3193] Detected OUTPUT__0 as execution binding for resnet_0
E0823 14:33:25.454761 8964 logging.cc:43] 3: [executionContext.cpp::setBindingDimensions::945] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::945, condition: engineDims.nbDims == dimensions.nbDims
)
E0823 14:33:25.454795 8964 tensorrt.cc:5090] Failed to set cuda graph shape for resnet_0trt failed to set binding dimension to [1,5,3,224,224] for binding 0 for resnet_0
I0823 14:33:25.454810 8964 tensorrt.cc:1426] Created instance resnet_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0823 14:33:25.454952 8964 backend_model_instance.cc:734] Starting backend thread for resnet_0 at nice 0 on device 0...
I0823 14:33:25.455107 8964 model_repository_manager.cc:1352] successfully loaded 'resnet' version 1
It will add batch dimension and making the original dimension 5x3x224x224 become 1x5x224x224. I found that tensorrt backend increases this dimension when max_batch_size is equal to 0 https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/tensorrt.cc#L5348. I also tried to set the batch size in the cuda spec to 5, but he will get an error when verifying the configuration, the configuration as follows
platform: "tensorrt_plan"
max_batch_size : 0
input: [
{
name: "INPUT__0",
data_type: TYPE_FP32,
dims: [5, 3, 224, 224],
}
],
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0]
}
]
optimization{
graph: {
level : 1
},
eager_batching : 1,
cuda: {
graphs:1,
graph_spec: [
{
batch_size:5,
input: {
key: "INPUT__0",
value: {dim:[3, 224, 224]}
}
}
],
busy_wait_events:1,
output_copy_stream: 1
}
}
Expected behavior We don’t want to use dynamic batcher, so we need to set max_batch_size to 0, and we also need to use cuda graph,how do we need to configure these two features
thanks
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Should be fixed by https://github.com/triton-inference-server/tensorrt_backend/pull/48. The test is added here: https://github.com/triton-inference-server/server/pull/4913 Fix will be officially available in Triton 22.10 release.
@tanmayv25 Yes,Triton can successfully load the model whether cuda graph is enabled or not.The config when I provide no config.pbtxt as follows