Efficient Way to Send And Retrieve Image Inference Response
See original GitHub issueDescription
I am trying to infer an image through model which has output dimensions of [3096,3096,3] a fairly large image. I have modified the inference client script for inference. The inference time takes around 0.9s but the response from server takes 5 seconds to client. As I reduce the image input and output to [100,100,3] the response time reduces to 0.1 seconds from 5. Which is obviously due to sending large image over Tensor format. I have triton server and client side both on local machine.
Using pb_utils.Tensor to send inference response.
out_tensor_0 = pb_utils.Tensor("OUTPUT_0", output.astype(output0_dtype)) #output0_dtype is object type
Here is mode pbtxt file.
name: “model” backend: “python” max_batch_size : 8 input [ { name: “INPUT_0” data_type: TYPE_FP32 format : FORMAT_NHWC dims: [ 1536, 1536 ,3] } ] output [ { name: “OUTPUT_0” data_type: TYPE_STRING dims: [ 3096, 3096,3 ] } ]
instance_group [{ kind: KIND_GPU }]
parameters: { key: “EXECUTION_ENV_PATH”, value: {string_value: “$$TRITON_MODEL_DIRECTORY/custom_env_v2.tar.gz”} }
And this is client side
infer = results.as_numpy('OUTPUT__0')
I have enabled binary flag but it is of no value as well for now.
outputs = [ client.InferRequestedOutput(output_name, binary_data=True) ]
I tried alot to use base 64 encoding to reduce this overhead but it was’nt working due to must use of pb_utils.Tensor which only takes numpy array as input. Please let me know how should I send large images efficiently from server to client which is faster. Thanks !
Triton Information tritonserver:latest
Are you using the Triton container or did you build it yourself? I built the container myself
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). I am using python backend and the model is pytorch model
Expected behavior The inference response time should be faster and efficient.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
BTW I created a solution which seems to be much efficient using cv2.imencode and decode.
Here is the server side.
And here is the client side
Hope this will help someone else or maybe triton team can suggest a better way than this. Thanks !
I simply input image url and move downloading and preprocessing serverside.
model.py
the performance declines when comes to around 100 concurrency and I wonder if there is any better practice for image inference