question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

is the dynamic batcher setting sucess?

See original GitHub issue

I use the electra onnx model to get sentence representation. This is the config.pbtxt

platform: "onnxruntime_onnx"
backend: "onnxruntime"
dynamic_batching {
preferred_batch_size: [ 4, 8, 32 ]
max_queue_delay_microseconds: 1000
}
version_policy: {
    latest: {
      num_versions: 1
    }
  }
max_batch_size: 100
input: [
    {
      name: "token_type_ids"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8]
      is_shape_tensor: false
      allow_ragged_batch: false
    },
    {
      name: "attention_mask"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8]
      is_shape_tensor: false
      allow_ragged_batch: false
    },
    {
      name: "input_ids"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8],
      is_shape_tensor: false,
      allow_ragged_batch: false
    }
  ]
output: [
    {
      name: "output_electra"
      data_type: TYPE_FP32
      dims: [8,256]
      label_filename: ""
      is_shape_tensor: false
    }
  ]
 batch_input: []
 batch_output: []
 optimization: {
    priority: PRIORITY_DEFAULT
    input_pinned_memory: {
      enable: true
    }
    output_pinned_memory: {
      enable: true
    }
  }
instance_group: [
{
  name: "electra_onnx_model"
  kind: KIND_GPU
  count: 1
  gpus: [0]
  profile: []
}
]
default_model_filename: "model.onnx"
cc_model_filenames: {}
metric_tags: {}
parameters: {}
model_warmup: []```   


I  use the http to get result with path /v2/models/electra_onnx_model/infer. 
I can get correct response with input shape [bacthsize,8].  
but I  wonder if  the dynamic batcher  setting correctly?  
when using jmeter to test , it seems only process one request each time, not combined to a batch 


```I0906 07:57:06.702650 1 http_server.cc:1229] HTTP request: 2 /v2/models/electra_onnx_model/infer
I0906 07:57:06.702691 1 model_repository_manager.cc:496] GetInferenceBackend() 'electra_onnx_model' version -1
I0906 07:57:06.702704 1 model_repository_manager.cc:496] GetInferenceBackend() 'electra_onnx_model' version -1
I0906 07:57:06.702763 1 infer_request.cc:502] prepared: [0x0x7f228801d9d0] request id: , model: electra_onnx_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f228801f4d8] input: token_type_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801d828] input: attention_mask, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f2288012ca8] input: input_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
override inputs:
inputs:
[0x0x7f2288012ca8] input: input_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801d828] input: attention_mask, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801f4d8] input: token_type_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
original requested outputs:
output_electra
requested outputs:
output_electra

I0906 07:57:06.702829 1 onnxruntime.cc:1896] model electra_onnx_model, instance electra_onnx_model, executing 1 requests
I0906 07:57:06.702844 1 onnxruntime.cc:940] TRITONBACKEND_ModelExecute: Running electra_onnx_model with 1 requests
I0906 07:57:06.702858 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e000090
I0906 07:57:06.702876 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e0000e0
I0906 07:57:06.702884 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e000130
2021-09-06 07:57:06.703016809 [I:onnxruntime:, sequential_executor.cc:157 Execute] Begin execution
2021-09-06 07:57:06.703764739 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 98368 bytes for Cuda

2021-09-06 07:57:06.703774308 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 64 bytes for Cpu

2021-09-06 07:57:06.703778428 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 64 bytes for CUDA_CPU

2021-09-06 07:57:06.703782619 [I:onnxruntime:, sequential_executor.cc:474 Execute] [Memory] ExecutionFrame dynamically allocates 8192 bytes for Cuda

I0906 07:57:06.704385 1 infer_response.cc:165] add response output: output: output_electra, type: FP32, shape: [1,8,256]
I0906 07:57:06.704400 1 http_server.cc:1200] HTTP using buffer for: 'output_electra', size: 8192, addr: 0x7f22c427a140
I0906 07:57:06.704736 1 http_server.cc:1215] HTTP release: size 8192, addr 0x7f22c427a140
I0906 07:57:06.704752 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e000090
I0906 07:57:06.704762 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e0000e0
I0906 07:57:06.704770 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e000130```


any wrong with it , how to set dynamic batcher correctly?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
tanmayv25commented, Sep 8, 2021

Are you creating sufficient concurrent requests to the server using jmeter? If the intervals between requests is greater than 1000us then there will not be any batching. If you have sufficient request concurrency, then try increasing the max_queue_delay_microseconds parameter.

0reactions
dziercommented, Sep 8, 2021

Looks like this issue has been resolved. Please re-open if you have more questions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Optimize your inference jobs using dynamic batch ... - AWS
Batch processing can increase throughput and optimize your resources because it helps complete a larger number of inferences in a certain amount ...
Read more >
[Question] Setting dynamic batching with warmup #4373
Hi, I'm trying to deploy an MMDetection yolox-s model which I converted to an end2end.engine file using MMDeploy.
Read more >
Setting Dynamic Batch/Image Size and Dynamic AIPP
Set the dynamic batch/image size choices, and dynamic AIPP parameters after the model is successfully loaded before model execution.
Read more >
Allowed batch size for Dynamic Batch Size confusing
The app will iterate over all batch sizes to show success and error. ... To set the dynamic batch size in between the...
Read more >
Adaptive Stream Processing using Dynamic Batch Sizing
Thus, it is important to dynamically adapt the batch interval as the operating conditions demand. Figure 4c illustrates our goal.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found