Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to set cuda graph shape when I set max_batch_size==0

See original GitHub issue

Description cuda graph failed when I set max_batch_size==0

Triton Information What version of Triton are you using? 22.04

Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:22.04-py3

To Reproduce model I used pytorch ResNet18 pretrained model ，and converted to onnx model

import torch
from torch import nn
import torchvision
import argparse
import torchvision.models as models

parser = argparse.ArgumentParser()
parser.add_argument("--output_model", type=str, required=True, help="model output path")

def main():
    args = parser.parse_args()
    output_model_path = args.output_model
    model = models.resnet18()
    model = model.to('cuda:0')
    model.eval()
    x = torch.ones(1, 3, 224, 224).to('cuda:0')
    torch.onnx.export(
            model=model,
            args=x,
            f=output_model_path,
            opset_version=11,
            export_params=True,
            do_constant_folding=True,
            input_names = ['INPUT__0'],
            output_names = ['OUTPUT__0'],
            dynamic_axes={'INPUT__0' : {0:'bs'}, 'OUTPUT__0' : {0:'bs'}}
        )

if __name__ == '__main__':
    main()

then I converted it to tensorrt plan file

trtexec --onnx=resnet18.onnx --explicitBatch --optShapes=INPUT__0:5x3x224x224 --buildOnly --saveEngine=resnet18.plan --workspace=12288 --device=1

tritonserver config my tritonserver config

platform: "tensorrt_plan"
max_batch_size : 0
input: [
    {
        name: "INPUT__0",
        data_type: TYPE_FP32,
        dims: [5, 3, 224, 224],
    }
],
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]
  }
]
optimization{
  graph: {
      level : 1
  },
  eager_batching : 1,
  cuda: {
    graphs:1,
    graph_spec: [
      {
        input: {
            key: "INPUT__0",
            value: {dim:[5, 3, 224, 224]}
        }
      }
    ],
    busy_wait_events:1,
    output_copy_stream: 1
  }
}

result when I run tritonserver，I get following error:

I0823 14:33:25.438942 8964 tensorrt.cc:3193] Detected INPUT__0 as execution binding for resnet_0
I0823 14:33:25.438952 8964 tensorrt.cc:3193] Detected OUTPUT__0 as execution binding for resnet_0
E0823 14:33:25.454761 8964 logging.cc:43] 3: [executionContext.cpp::setBindingDimensions::945] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::945, condition: engineDims.nbDims == dimensions.nbDims
)
E0823 14:33:25.454795 8964 tensorrt.cc:5090] Failed to set cuda graph shape for resnet_0trt failed to set binding dimension to [1,5,3,224,224] for binding 0 for resnet_0
I0823 14:33:25.454810 8964 tensorrt.cc:1426] Created instance resnet_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0823 14:33:25.454952 8964 backend_model_instance.cc:734] Starting backend thread for resnet_0 at nice 0 on device 0...
I0823 14:33:25.455107 8964 model_repository_manager.cc:1352] successfully loaded 'resnet' version 1

It will add batch dimension and making the original dimension 5x3x224x224 become 1x5x224x224. I found that tensorrt backend increases this dimension when max_batch_size is equal to 0 https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/tensorrt.cc#L5348. I also tried to set the batch size in the cuda spec to 5, but he will get an error when verifying the configuration, the configuration as follows

platform: "tensorrt_plan"
max_batch_size : 0
input: [
    {
        name: "INPUT__0",
        data_type: TYPE_FP32,
        dims: [5, 3, 224, 224],
    }
],
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]
  }
]
optimization{
  graph: {
      level : 1
  },
  eager_batching : 1,
  cuda: {
    graphs:1,
    graph_spec: [
      {
        batch_size:5,
        input: {
            key: "INPUT__0",
            value: {dim:[3, 224, 224]}
        }
      }
    ],
    busy_wait_events:1,
    output_copy_stream: 1
  }
}

Expected behavior We don’t want to use dynamic batcher, so we need to set max_batch_size to 0, and we also need to use cuda graph,how do we need to configure these two features

thanks

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

tanmayv25commented, Sep 23, 2022

Should be fixed by https://github.com/triton-inference-server/tensorrt_backend/pull/48. The test is added here: https://github.com/triton-inference-server/server/pull/4913 Fix will be officially available in Triton 22.10 release.

0reactions

wangchengdngcommented, Aug 25, 2022

Is Triton able to load the model when not using cuda graph and other optimizations? Can you provide no config.pbtxt for the model and share what config Triton generates/autocompletes for the model?

@tanmayv25 Yes,Triton can successfully load the model whether cuda graph is enabled or not.The config when I provide no config.pbtxt as follows

{
    "name": "resnet",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 5,
    "input": [
        {
            "name": "INPUT__0",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                224,
                224
            ],
            "is_shape_tensor": false
        }
    ],
    "output": [
        {
            "name": "OUTPUT__0",
            "data_type": "TYPE_FP32",
            "dims": [
                1000
            ],
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "resnet",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0,
                1,
                2,
                3
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "model.plan",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": [],
    "dynamic_batching": {}
}