Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

is the dynamic batcher setting sucess?

See original GitHub issue

I use the electra onnx model to get sentence representation. This is the config.pbtxt

platform: "onnxruntime_onnx"
backend: "onnxruntime"
dynamic_batching {
preferred_batch_size: [ 4, 8, 32 ]
max_queue_delay_microseconds: 1000
}
version_policy: {
    latest: {
      num_versions: 1
    }
  }
max_batch_size: 100
input: [
    {
      name: "token_type_ids"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8]
      is_shape_tensor: false
      allow_ragged_batch: false
    },
    {
      name: "attention_mask"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8]
      is_shape_tensor: false
      allow_ragged_batch: false
    },
    {
      name: "input_ids"
      data_type: TYPE_INT64
      format: FORMAT_NONE
      dims: [8],
      is_shape_tensor: false,
      allow_ragged_batch: false
    }
  ]
output: [
    {
      name: "output_electra"
      data_type: TYPE_FP32
      dims: [8,256]
      label_filename: ""
      is_shape_tensor: false
    }
  ]
 batch_input: []
 batch_output: []
 optimization: {
    priority: PRIORITY_DEFAULT
    input_pinned_memory: {
      enable: true
    }
    output_pinned_memory: {
      enable: true
    }
  }
instance_group: [
{
  name: "electra_onnx_model"
  kind: KIND_GPU
  count: 1
  gpus: [0]
  profile: []
}
]
default_model_filename: "model.onnx"
cc_model_filenames: {}
metric_tags: {}
parameters: {}
model_warmup: []```   


I  use the http to get result with path /v2/models/electra_onnx_model/infer. 
I can get correct response with input shape [bacthsize,8].  
but I  wonder if  the dynamic batcher  setting correctly?  
when using jmeter to test , it seems only process one request each time, not combined to a batch 


```I0906 07:57:06.702650 1 http_server.cc:1229] HTTP request: 2 /v2/models/electra_onnx_model/infer
I0906 07:57:06.702691 1 model_repository_manager.cc:496] GetInferenceBackend() 'electra_onnx_model' version -1
I0906 07:57:06.702704 1 model_repository_manager.cc:496] GetInferenceBackend() 'electra_onnx_model' version -1
I0906 07:57:06.702763 1 infer_request.cc:502] prepared: [0x0x7f228801d9d0] request id: , model: electra_onnx_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f228801f4d8] input: token_type_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801d828] input: attention_mask, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f2288012ca8] input: input_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
override inputs:
inputs:
[0x0x7f2288012ca8] input: input_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801d828] input: attention_mask, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
[0x0x7f228801f4d8] input: token_type_ids, type: INT64, original shape: [1,8], batch + shape: [1,8], shape: [8]
original requested outputs:
output_electra
requested outputs:
output_electra

I0906 07:57:06.702829 1 onnxruntime.cc:1896] model electra_onnx_model, instance electra_onnx_model, executing 1 requests
I0906 07:57:06.702844 1 onnxruntime.cc:940] TRITONBACKEND_ModelExecute: Running electra_onnx_model with 1 requests
I0906 07:57:06.702858 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e000090
I0906 07:57:06.702876 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e0000e0
I0906 07:57:06.702884 1 pinned_memory_manager.cc:131] pinned memory allocation: size 64, addr 0x7f237e000130
2021-09-06 07:57:06.703016809 [I:onnxruntime:, sequential_executor.cc:157 Execute] Begin execution
2021-09-06 07:57:06.703764739 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 98368 bytes for Cuda

2021-09-06 07:57:06.703774308 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 64 bytes for Cpu

2021-09-06 07:57:06.703778428 [I:onnxruntime:, sequential_executor.cc:469 Execute] [Memory] ExecutionFrame statically allocates 64 bytes for CUDA_CPU

2021-09-06 07:57:06.703782619 [I:onnxruntime:, sequential_executor.cc:474 Execute] [Memory] ExecutionFrame dynamically allocates 8192 bytes for Cuda

I0906 07:57:06.704385 1 infer_response.cc:165] add response output: output: output_electra, type: FP32, shape: [1,8,256]
I0906 07:57:06.704400 1 http_server.cc:1200] HTTP using buffer for: 'output_electra', size: 8192, addr: 0x7f22c427a140
I0906 07:57:06.704736 1 http_server.cc:1215] HTTP release: size 8192, addr 0x7f22c427a140
I0906 07:57:06.704752 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e000090
I0906 07:57:06.704762 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e0000e0
I0906 07:57:06.704770 1 pinned_memory_manager.cc:158] pinned memory deallocation: addr 0x7f237e000130```


any wrong with it , how to set dynamic batcher correctly?

Issue Analytics

State:
Created 2 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

tanmayv25commented, Sep 8, 2021

Are you creating sufficient concurrent requests to the server using jmeter? If the intervals between requests is greater than 1000us then there will not be any batching. If you have sufficient request concurrency, then try increasing the max_queue_delay_microseconds parameter.

0reactions

dziercommented, Sep 8, 2021

Looks like this issue has been resolved. Please re-open if you have more questions.