Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dynamic Batching not creating batches correctly and incorrect inference results

See original GitHub issue

Description I am deploying a triton server to GKE via the gke-marketplace-app documentation. When I try to use dynamic batching, the requests are not batched and it is only sent with a batch size of 1. Additionally, the inference only results in one detection when it should be multiple.

Triton Information The version is 2.17 as this is what the marketplace feature deploys.

Are you using the Triton container or did you build it yourself? Deployed via gcp marketplace

To Reproduce I create the inference server with the following config:

name: "sample"
platform: "pytorch_libtorch"
max_batch_size : 16
input [
  {
    name: "INPUT__0"
    data_type: TYPE_UINT8
    format: FORMAT_NCHW
    dims: [ 3, 512, 512 ]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  },
 {
    name: "OUTPUT__1"
    data_type: TYPE_INT64
    dims: [ -1 ]
   label_filename: "sample.txt"
  },
  {
    name: "OUTPUT__2"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]
 dynamic_batching {
    max_queue_delay_microseconds: 50000
}

I am calling inference as follows:

model = "sample"
client = httpclient.InferenceServerClient( url = url )

input_1 = httpclient.InferInput(name = "INPUT__0", shape = list(data.shape), datatype = "UINT8")
input_2 = httpclient.InferInput(name = "INPUT__0", shape = list(data.shape), datatype = "UINT8")

input_1.set_data_from_numpy(data, binary_data = True)
input_2.set_data_from_numpy(data, binary_data = True)

output_00 = httpclient.InferRequestedOutput(name = "OUTPUT__0", binary_data = False)
output_01 = httpclient.InferRequestedOutput(name = "OUTPUT__1", binary_data = False)
output_02 = httpclient.InferRequestedOutput(name = "OUTPUT__2", binary_data = False)

output_10 = httpclient.InferRequestedOutput(name = "OUTPUT__0", binary_data = False)
output_11 = httpclient.InferRequestedOutput(name = "OUTPUT__1", binary_data = False)
output_12 = httpclient.InferRequestedOutput(name = "OUTPUT__2", binary_data = False)

# Is this correct? I tried using reshape in the config, but it did not work. Without this I get errors about data shape.
input_1.set_shape([1, 3, 512, 512]
input_2.set_shape([1, 3, 512, 512]

response_1 = client.async_infer(model_name = model, inputs = [input_1], outputs = [output_00, output_01, output_02])
response_2 = client.async_infer(model_name = model, inputs = [input_2], outputs = [output_10, output_11, output_12])

Expected behavior With the above code, when I run print(response_1.get_result().get_response()) I am only seeing one detection, but I know that the model detects multiple objects during direct inference on local:

{... [{'name': 'OUTPUT__0', 'datatype': 'FP32', 'shape': [1, 4], 'data': [x_min, y_min, x_max, y_max]}, ...}

Additionally, when I run print(client.get_inference_statistics()) I am seeing only a batch size of 1 when I expect 2 in this case:

{ ... 'batch_stats': [{'batch_size': 1, 'compute_input' : {'count': 2 ...}}] ... }

Issue Analytics

State:
Created a year ago
Comments:28 (17 by maintainers)

Top GitHub Comments

1reaction

rmccorm4commented, Jun 24, 2022

Hi @omrifried ,

Thanks for the reference. Do you mind sharing

a script I can run as-is to generate the torchscript model (or share the model itself)?
the corresponding Triton config.pbtxt to serve the model
client script with sample inputs to run

(I saw some pieces of these above, but having complete versions would help greatly to save time to look into this, thanks.)

Ticket ref: DLIS-3633

1reaction

GuanLuocommented, Apr 8, 2022

Also note that for HTTP Python client, you will need to set the concurrency for sending requests concurrently https://github.com/triton-inference-server/client/blob/main/src/python/examples/simple_http_async_infer_client.py#L55-L58

Top Results From Across the Web

Wrong inference results with dynamic batch size in C++ but ...

Description. I have a simple network that takes two input batches of images and concatenates them on the channel axis, i.e.

Dynamic batching - Unity - Manual

Dynamic batching is a draw call batching method that batches moving ... Unity always uses dynamic batching for dynamic geometry such as Particle...

Getting batch predictions | AI Platform Prediction - Google Cloud

Create a model resource and a version resource or put a TensorFlow SavedModel ... Verify that your input file is in the correct...

Batch inference of softmax does not sum to 1 - Stack Overflow

You have incorrectly specified the dimension for the softmax (across batches instead of across the variables), and hence when given a batch ...

Using Dynamic Batching - OpenVINO™ Documentation

Dynamic Batching feature allows you+ to dynamically change batch size for inference calls within preset batch size limit. This feature might be useful...