question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dynamic Batching not creating batches correctly and incorrect inference results

See original GitHub issue

Description I am deploying a triton server to GKE via the gke-marketplace-app documentation. When I try to use dynamic batching, the requests are not batched and it is only sent with a batch size of 1. Additionally, the inference only results in one detection when it should be multiple.

Triton Information The version is 2.17 as this is what the marketplace feature deploys.

Are you using the Triton container or did you build it yourself? Deployed via gcp marketplace

To Reproduce I create the inference server with the following config:

name: "sample"
platform: "pytorch_libtorch"
max_batch_size : 16
input [
  {
    name: "INPUT__0"
    data_type: TYPE_UINT8
    format: FORMAT_NCHW
    dims: [ 3, 512, 512 ]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  },
 {
    name: "OUTPUT__1"
    data_type: TYPE_INT64
    dims: [ -1 ]
   label_filename: "sample.txt"
  },
  {
    name: "OUTPUT__2"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]
 dynamic_batching {
    max_queue_delay_microseconds: 50000
}

I am calling inference as follows:

model = "sample"
client = httpclient.InferenceServerClient( url = url )

input_1 = httpclient.InferInput(name = "INPUT__0", shape = list(data.shape), datatype = "UINT8")
input_2 = httpclient.InferInput(name = "INPUT__0", shape = list(data.shape), datatype = "UINT8")

input_1.set_data_from_numpy(data, binary_data = True)
input_2.set_data_from_numpy(data, binary_data = True)

output_00 = httpclient.InferRequestedOutput(name = "OUTPUT__0", binary_data = False)
output_01 = httpclient.InferRequestedOutput(name = "OUTPUT__1", binary_data = False)
output_02 = httpclient.InferRequestedOutput(name = "OUTPUT__2", binary_data = False)

output_10 = httpclient.InferRequestedOutput(name = "OUTPUT__0", binary_data = False)
output_11 = httpclient.InferRequestedOutput(name = "OUTPUT__1", binary_data = False)
output_12 = httpclient.InferRequestedOutput(name = "OUTPUT__2", binary_data = False)

# Is this correct? I tried using reshape in the config, but it did not work. Without this I get errors about data shape.
input_1.set_shape([1, 3, 512, 512]
input_2.set_shape([1, 3, 512, 512]

response_1 = client.async_infer(model_name = model, inputs = [input_1], outputs = [output_00, output_01, output_02])
response_2 = client.async_infer(model_name = model, inputs = [input_2], outputs = [output_10, output_11, output_12])

Expected behavior With the above code, when I run print(response_1.get_result().get_response()) I am only seeing one detection, but I know that the model detects multiple objects during direct inference on local:

{... [{'name': 'OUTPUT__0', 'datatype': 'FP32', 'shape': [1, 4], 'data': [x_min, y_min, x_max, y_max]}, ...}

Additionally, when I run print(client.get_inference_statistics()) I am seeing only a batch size of 1 when I expect 2 in this case:

{ ... 'batch_stats': [{'batch_size': 1, 'compute_input' : {'count': 2 ...}}] ... }

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:28 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
rmccorm4commented, Jun 24, 2022

Hi @omrifried ,

Thanks for the reference. Do you mind sharing

  1. a script I can run as-is to generate the torchscript model (or share the model itself)?
  2. the corresponding Triton config.pbtxt to serve the model
  3. client script with sample inputs to run

(I saw some pieces of these above, but having complete versions would help greatly to save time to look into this, thanks.)


Ticket ref: DLIS-3633

1reaction
GuanLuocommented, Apr 8, 2022

Also note that for HTTP Python client, you will need to set the concurrency for sending requests concurrently https://github.com/triton-inference-server/client/blob/main/src/python/examples/simple_http_async_infer_client.py#L55-L58

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wrong inference results with dynamic batch size in C++ but ...
Description. I have a simple network that takes two input batches of images and concatenates them on the channel axis, i.e.
Read more >
Dynamic batching - Unity - Manual
Dynamic batching is a draw call batching method that batches moving ... Unity always uses dynamic batching for dynamic geometry such as Particle...
Read more >
Getting batch predictions | AI Platform Prediction - Google Cloud
Create a model resource and a version resource or put a TensorFlow SavedModel ... Verify that your input file is in the correct...
Read more >
Batch inference of softmax does not sum to 1 - Stack Overflow
You have incorrectly specified the dimension for the softmax (across batches instead of across the variables), and hence when given a batch ...
Read more >
Using Dynamic Batching - OpenVINO™ Documentation
Dynamic Batching feature allows you+ to dynamically change batch size for inference calls within preset batch size limit. This feature might be useful...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found