question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

max_batch_size in config.pbtxt refer to model batch size or request batch size ?

See original GitHub issue

From what I understand, the docs on max_batch_size seem to indicate that the batching refers to batching of request objects.

Reading the backend code for onnx run time seems to confirm this as well

Code snippet in onnxruntime.cc => ModelInstanceState::ProcessRequests

  const int max_batch_size = model_state_->MaxBatchSize();

  // For each request collect the total batch size for this inference
  // execution. The batch-size, number of inputs, and size of each
  // input has already been checked so don't need to do that here.
  size_t total_batch_size = 0;
  for (size_t i = 0; i < request_count; i++) {
    // If we get a nullptr request then something is badly wrong. Fail
    // and release all requests.
    if (requests[i] == nullptr) {
      RequestsRespondWithError(
          requests, request_count,
          TRITONSERVER_ErrorNew(
              TRITONSERVER_ERROR_INTERNAL,
              std::string(
                  "null request given to ONNX Runtime backend for '" + Name() +
                  "'")
                  .c_str()));
      return;
    }

    if (max_batch_size > 0) {
      // Retrieve the batch size from one of the inputs, if the model
      // supports batching, the first dimension size is batch size
      TRITONBACKEND_Input* input;
      TRITONSERVER_Error* err =
          TRITONBACKEND_RequestInputByIndex(requests[i], 0 /* index */, &input);
      if (err == nullptr) {
        const int64_t* shape;
        err = TRITONBACKEND_InputProperties(
            input, nullptr, nullptr, &shape, nullptr, nullptr, nullptr);
        total_batch_size += shape[0];
      }
      if (err != nullptr) {
        RequestsRespondWithError(requests, request_count, err);
        return;
      }
    } else {
      total_batch_size += 1;
    }
  }

  // If there are no valid payloads then no need to run the inference.
  if (total_batch_size == 0) {
    return;
  }

However, the docs also mention this paragraph below, and it seems to me that max_batch_size refers to the batch size of the input tensors that is being fed into the model rather than the request batch size ?

Input and output shapes are specified by a combination of max_batch_size and the dimensions specified by the input or output dims property. For models with max_batch_size greater-than 0, the full shape is formed as [ -1 ] + dims. For models with max_batch_size equal to 0, the full shape is formed as dims. For example, for the following configuration the shape of “input0” is [ -1, 16 ] and the shape of “output0” is [ -1, 4 ].

Hope I am able to clarify my understanding.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

7reactions
tanmayv25commented, Nov 5, 2021

The max_batch_size in the model config is a property of model. It indicates what’s the max possible shape value for the first dimension that the model can support.

In triton client, batch size is just treated as any other variable dimension. The user can send requests with any supported shape.

To better explain the relation, let’s assume the max_batch_size is set to 10, and onnx model supports shape [-1, 256, 256] The client can send 4 images in first request, 2 in second request and 4 in third request. Assuming the three requests were sent in quick succession such that max_queue_delay time for dynamic batcher was not elapsed, Triton will form a batch of these three requests(the request_count in backend code above will be 3). The tensor data from the three requests will be merged and model will execute with input tensor shaped [10, 256, 256].

Client can also send a request with 10 images. The backend will only get one request in this case as max_batch_size is already reached. In third case, client can send requests with single image. The backend may get upto 10 requests in this case.

Triton dynamic batcher can only grow batches and not split them. Choosing efficient max_batch_size really depends upon your model. But at the very least it should be atleast as large as the size client is expected to send in a single request. You can see model_analyzer for tuning these parameters.

2reactions
yichong96commented, Nov 8, 2021

Thank you very much. Managed to run model inference with batching with both your help ! @dyastremsky @tanmayv25

Read more comments on GitHub >

github_iconTop Results From Across the Web

nvidia - What is the difference between batch-size, preferred ...
In this example model supported max batch size of 32. And server attempts to create a batch size of 4 and 8 while...
Read more >
What is the difference between batch-size, preferred batch ...
In this example model supported max batch size of 32. And server attempts to create a batch size of 4 and 8 while...
Read more >
Adaptive Batching - BentoML
Our adaptive batching adapts both the batching window and the max batch size based off of incoming traffic patterns at the time. The...
Read more >
Benchmarking Triton (TensorRT) Inference Server for Hosting ...
These performance gains diminish as model size and/or batch size grows. ... (2) write a config.pbtxt model configuration file, ...
Read more >
Triton Inference Server教程2 - CSDN博客
本文介绍如何编写一个config文件,config.pbtxt文件中包含哪些可以配置的 ... max batch size = 8:注意,若max batch size大于0,默认网络的batch大小 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found