question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Efficient Way to Send And Retrieve Image Inference Response

See original GitHub issue

Description

I am trying to infer an image through model which has output dimensions of [3096,3096,3] a fairly large image. I have modified the inference client script for inference. The inference time takes around 0.9s but the response from server takes 5 seconds to client. As I reduce the image input and output to [100,100,3] the response time reduces to 0.1 seconds from 5. Which is obviously due to sending large image over Tensor format. I have triton server and client side both on local machine.

Using pb_utils.Tensor to send inference response. out_tensor_0 = pb_utils.Tensor("OUTPUT_0", output.astype(output0_dtype)) #output0_dtype is object type

Here is mode pbtxt file.

name: “model” backend: “python” max_batch_size : 8 input [ { name: “INPUT_0” data_type: TYPE_FP32 format : FORMAT_NHWC dims: [ 1536, 1536 ,3] } ] output [ { name: “OUTPUT_0” data_type: TYPE_STRING dims: [ 3096, 3096,3 ] } ]

instance_group [{ kind: KIND_GPU }]

parameters: { key: “EXECUTION_ENV_PATH”, value: {string_value: “$$TRITON_MODEL_DIRECTORY/custom_env_v2.tar.gz”} }

And this is client side

infer = results.as_numpy('OUTPUT__0')

I have enabled binary flag but it is of no value as well for now.

outputs = [ client.InferRequestedOutput(output_name, binary_data=True) ]

I tried alot to use base 64 encoding to reduce this overhead but it was’nt working due to must use of pb_utils.Tensor which only takes numpy array as input. Please let me know how should I send large images efficiently from server to client which is faster. Thanks !

Triton Information tritonserver:latest

Are you using the Triton container or did you build it yourself? I built the container myself

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). I am using python backend and the model is pytorch model

Expected behavior The inference response time should be faster and efficient.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
mabdullahrafiquecommented, Jan 11, 2022

BTW I created a solution which seems to be much efficient using cv2.imencode and decode.

Here is the server side.

            output, _ = self.model(img)
            time2 = time.time()

            print("Inference time took  {}".format(time2-time1), flush=True)
            
            output = cv2.imencode('.jpg', output)[1]

            out_tensor_0 = pb_utils.Tensor("OUTPUT_0",
                                        output.astype(output0_dtype))

            print("Creation of output tensor took {}".format(time.time()-time2))\

            timev3 = time.time()

            # pb_utils.InferenceResponse(
            #    output_tensors=..., TritonError("An error occured"))
            inference_response = pb_utils.InferenceResponse(
                output_tensors=[out_tensor_0])`

And here is the client side

                infer = results.as_numpy('OUTPUT_0')
                print(infer)
                output =cv.imdecode(infer, cv.IMREAD_COLOR)
            
                print(infer)
               
                cv.imwrite("{}_output_image.jpg".format(count), output)

Hope this will help someone else or maybe triton team can suggest a better way than this. Thanks !

0reactions
xiaoFinecommented, Oct 8, 2022

I simply input image url and move downloading and preprocessing serverside.

backend: "python"
dynamic_batching {}
max_batch_size: 64

input [
  {
    name: "URL"
    data_type: TYPE_STRING
    dims: [1]
    reshape: { shape: [ ] }

  }
]

output [
  {
    name: "BOXES"
    data_type: TYPE_INT32
    dims: [-1,4]
  }
]

model.py

in_0 = pb_utils.get_input_tensor_by_name(request, "URL").as_numpy().astype(np.bytes_)[0]

url = in_0.decode('utf-8')
image_bytes = await download(url)
image_array = np.frombuffer(image_bytes, dtype=np.uint8)
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)

image_processed = preprocess(image)

the performance declines when comes to around 100 concurrency and I wonder if there is any better practice for image inference

Read more comments on GitHub >

github_iconTop Results From Across the Web

From raw images to real-time predictions with Deep Learning
I find it very interesting how we can now automatically extract knowledge from complex raw data structures such as images.
Read more >
Use Your Own Inference Code with Hosting Services
This section explains how Amazon SageMaker interacts with a Docker container that runs your own inference code for hosting services. Use this information...
Read more >
amazon s3 - Sagemaker image classification: Best way to ...
So I cannot perform inference on my training data in RecordIO format. To overcome this I copied all the raw .jpg images (~...
Read more >
Review of Recent Deep Learning Based Methods for Image ...
localization [38], knowledge transfer [10] and text-to-image generation [43, 31, 17]. ... this paper, we focus on cross-modal retrieval methods based.
Read more >
Image processing with batch deployments - Azure
Learn how to deploy a model in batch endpoints that process images.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found