max_batch_size in config.pbtxt refer to model batch size or request batch size ?
See original GitHub issueFrom what I understand, the docs on max_batch_size
seem to indicate that the batching refers to batching of request
objects.
Reading the backend code for onnx run time seems to confirm this as well
Code snippet in onnxruntime.cc => ModelInstanceState::ProcessRequests
const int max_batch_size = model_state_->MaxBatchSize();
// For each request collect the total batch size for this inference
// execution. The batch-size, number of inputs, and size of each
// input has already been checked so don't need to do that here.
size_t total_batch_size = 0;
for (size_t i = 0; i < request_count; i++) {
// If we get a nullptr request then something is badly wrong. Fail
// and release all requests.
if (requests[i] == nullptr) {
RequestsRespondWithError(
requests, request_count,
TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INTERNAL,
std::string(
"null request given to ONNX Runtime backend for '" + Name() +
"'")
.c_str()));
return;
}
if (max_batch_size > 0) {
// Retrieve the batch size from one of the inputs, if the model
// supports batching, the first dimension size is batch size
TRITONBACKEND_Input* input;
TRITONSERVER_Error* err =
TRITONBACKEND_RequestInputByIndex(requests[i], 0 /* index */, &input);
if (err == nullptr) {
const int64_t* shape;
err = TRITONBACKEND_InputProperties(
input, nullptr, nullptr, &shape, nullptr, nullptr, nullptr);
total_batch_size += shape[0];
}
if (err != nullptr) {
RequestsRespondWithError(requests, request_count, err);
return;
}
} else {
total_batch_size += 1;
}
}
// If there are no valid payloads then no need to run the inference.
if (total_batch_size == 0) {
return;
}
However, the docs also mention this paragraph below, and it seems to me that max_batch_size
refers to the batch size of the input tensors that is being fed into the model rather than the request
batch size ?
Input and output shapes are specified by a combination of max_batch_size and the dimensions specified by the input or output dims property. For models with max_batch_size greater-than 0, the full shape is formed as [ -1 ] + dims. For models with max_batch_size equal to 0, the full shape is formed as dims. For example, for the following configuration the shape of “input0” is [ -1, 16 ] and the shape of “output0” is [ -1, 4 ].
Hope I am able to clarify my understanding.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
The
max_batch_size
in the model config is a property of model. It indicates what’s the max possible shape value for the first dimension that the model can support.In triton client, batch size is just treated as any other variable dimension. The user can send requests with any supported shape.
To better explain the relation, let’s assume the max_batch_size is set to 10, and onnx model supports shape [-1, 256, 256] The client can send 4 images in first request, 2 in second request and 4 in third request. Assuming the three requests were sent in quick succession such that max_queue_delay time for dynamic batcher was not elapsed, Triton will form a batch of these three requests(the
request_count
in backend code above will be 3). The tensor data from the three requests will be merged and model will execute with input tensor shaped [10, 256, 256].Client can also send a request with 10 images. The backend will only get one request in this case as max_batch_size is already reached. In third case, client can send requests with single image. The backend may get upto 10 requests in this case.
Triton dynamic batcher can only grow batches and not split them. Choosing efficient
max_batch_size
really depends upon your model. But at the very least it should be atleast as large as the size client is expected to send in a single request. You can see model_analyzer for tuning these parameters.Thank you very much. Managed to run model inference with batching with both your help ! @dyastremsky @tanmayv25