question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed to set cuda graph shape when I set max_batch_size==0

See original GitHub issue

Description cuda graph failed when I set max_batch_size==0

Triton Information What version of Triton are you using? 22.04

Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:22.04-py3

To Reproduce model I used pytorch ResNet18 pretrained model ,and converted to onnx model

import torch
from torch import nn
import torchvision
import argparse
import torchvision.models as models

parser = argparse.ArgumentParser()
parser.add_argument("--output_model", type=str, required=True, help="model output path")

def main():
    args = parser.parse_args()
    output_model_path = args.output_model
    model = models.resnet18()
    model = model.to('cuda:0')
    model.eval()
    x = torch.ones(1, 3, 224, 224).to('cuda:0')
    torch.onnx.export(
            model=model,
            args=x,
            f=output_model_path,
            opset_version=11,
            export_params=True,
            do_constant_folding=True,
            input_names = ['INPUT__0'],
            output_names = ['OUTPUT__0'],
            dynamic_axes={'INPUT__0' : {0:'bs'}, 'OUTPUT__0' : {0:'bs'}}
        )

if __name__ == '__main__':
    main()

then I converted it to tensorrt plan file

trtexec --onnx=resnet18.onnx --explicitBatch --optShapes=INPUT__0:5x3x224x224 --buildOnly --saveEngine=resnet18.plan --workspace=12288 --device=1

tritonserver config my tritonserver config

platform: "tensorrt_plan"
max_batch_size : 0
input: [
    {
        name: "INPUT__0",
        data_type: TYPE_FP32,
        dims: [5, 3, 224, 224],
    }
],
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]
  }
]
optimization{
  graph: {
      level : 1
  },
  eager_batching : 1,
  cuda: {
    graphs:1,
    graph_spec: [
      {
        input: {
            key: "INPUT__0",
            value: {dim:[5, 3, 224, 224]}
        }
      }
    ],
    busy_wait_events:1,
    output_copy_stream: 1
  }
}

result when I run tritonserver,I get following error:

I0823 14:33:25.438942 8964 tensorrt.cc:3193] Detected INPUT__0 as execution binding for resnet_0
I0823 14:33:25.438952 8964 tensorrt.cc:3193] Detected OUTPUT__0 as execution binding for resnet_0
E0823 14:33:25.454761 8964 logging.cc:43] 3: [executionContext.cpp::setBindingDimensions::945] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::945, condition: engineDims.nbDims == dimensions.nbDims
)
E0823 14:33:25.454795 8964 tensorrt.cc:5090] Failed to set cuda graph shape for resnet_0trt failed to set binding dimension to [1,5,3,224,224] for binding 0 for resnet_0
I0823 14:33:25.454810 8964 tensorrt.cc:1426] Created instance resnet_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0823 14:33:25.454952 8964 backend_model_instance.cc:734] Starting backend thread for resnet_0 at nice 0 on device 0...
I0823 14:33:25.455107 8964 model_repository_manager.cc:1352] successfully loaded 'resnet' version 1

It will add batch dimension and making the original dimension 5x3x224x224 become 1x5x224x224. I found that tensorrt backend increases this dimension when max_batch_size is equal to 0 https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/tensorrt.cc#L5348. I also tried to set the batch size in the cuda spec to 5, but he will get an error when verifying the configuration, the configuration as follows

platform: "tensorrt_plan"
max_batch_size : 0
input: [
    {
        name: "INPUT__0",
        data_type: TYPE_FP32,
        dims: [5, 3, 224, 224],
    }
],
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]
  }
]
optimization{
  graph: {
      level : 1
  },
  eager_batching : 1,
  cuda: {
    graphs:1,
    graph_spec: [
      {
        batch_size:5,
        input: {
            key: "INPUT__0",
            value: {dim:[3, 224, 224]}
        }
      }
    ],
    busy_wait_events:1,
    output_copy_stream: 1
  }
}

Expected behavior We don’t want to use dynamic batcher, so we need to set max_batch_size to 0, and we also need to use cuda graph,how do we need to configure these two features

thanks

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tanmayv25commented, Sep 23, 2022

Should be fixed by https://github.com/triton-inference-server/tensorrt_backend/pull/48. The test is added here: https://github.com/triton-inference-server/server/pull/4913 Fix will be officially available in Triton 22.10 release.

0reactions
wangchengdngcommented, Aug 25, 2022

Is Triton able to load the model when not using cuda graph and other optimizations? Can you provide no config.pbtxt for the model and share what config Triton generates/autocompletes for the model?

@tanmayv25 Yes,Triton can successfully load the model whether cuda graph is enabled or not.The config when I provide no config.pbtxt as follows

{
    "name": "resnet",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 5,
    "input": [
        {
            "name": "INPUT__0",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                224,
                224
            ],
            "is_shape_tensor": false
        }
    ],
    "output": [
        {
            "name": "OUTPUT__0",
            "data_type": "TYPE_FP32",
            "dims": [
                1000
            ],
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "resnet",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0,
                1,
                2,
                3
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "model.plan",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": [],
    "dynamic_batching": {}
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA graph capturing fails for nn.Embedding and large batch ...
Capturing CUDA graphs fails with a somewhat unspecific error when using nn.Embedding (and back-propagating through it) with batch sizes ...
Read more >
6.28. Graph Management - NVIDIA Documentation Center
This section describes the graph management functions of CUDA runtime ... cudaEvent_t event ): Sets the event for an event record node in...
Read more >
CUDA graph does not run as expected - Stack Overflow
I'm using the following the code to learn about how to use "CUDA graphs". The parameter NSTEP is set as 1000, and the...
Read more >
Accelerating PyTorch with CUDA Graphs
CUDA work issued to a capturing stream doesn't actually run on the GPU. Instead, the work is recorded in a graph.
Read more >
Troubleshooting GCP + CUDA/NVIDIA + Docker and Keeping ...
I had a Google Cloud Platform (GCP) instance which was all setup well and running fine a day ago, which was set ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found