question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gRPC Client ~4x slower than http requests in docker

See original GitHub issue

Im running the containerized torchserve to deploy a simple face recognition model. I have a 20mb torchscript model, that takes an input image of size 3x72x53 and returns K dimensional features as the embeddings. Now by simply using the requests library in python, my inference runtime is 0.02 seconds (which is very good since its very close to running the same torchscript model locally), but while running the same model with gRPC client as described in the docs its taking > 0.07 seconds.

Now out of curiosity, I ran the same client script inside the docker container for both http and grpc, and I found that grpc is equally faster if not more. This means that the issue has something to do with the docker port forwarding I suppose.

Here’s my config.properties:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
load_models=all
install_py_dep_per_model=true # since I have cv2 dependency

This is my handler:

import torch
import base64
from ts.torch_handler.base_handler import BaseHandler
from torchvision import transforms
from PIL import Image
import cv2
import numpy as np

class ModelHandler(BaseHandler):
    def initialize(self, context):
        super().initialize(context)

        self.transform = transforms.Compose([
            transforms.Resize((72, 54)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ])

    def preprocess(self, data):
        images = []
        for instance in data:
            try:
                cv2_encoded_bytes = instance.get("data") or instance.get("body")
            except:
                cv2_encoded_bytes = instance

            if isinstance(cv2_encoded_bytes, (bytearray, bytes)):
                cv2_decoded_bytes = base64.b64decode(cv2_encoded_bytes)
                cv2_decoded = np.frombuffer(cv2_decoded_bytes, dtype = np.uint8)
                cv2_image = cv2.imdecode(cv2_decoded, 1)

                # Im sending cv2.imcode as the model inputs to have fewer network bottlenecks
                # hence using cv2.imdecode to get back the image

                image = Image.fromarray(cv2_image)
                image = self.transform(image)
            else:
                image = torch.FloatTensor(image)

            images.append(image)

        return torch.stack(images).to(self.device)

    def postprocess(self, inference_output): # same as BaseHandler
        return inference_output.tolist()

And below is my client file that has the logic of calling the model from http and grpc:

import cv2
import base64
import numpy as np
import requests
from time import time

import grpc
import inference_pb2
import inference_pb2_grpc
import management_pb2_grpc

cv2_image = np.ones((512, 512, 3), dtype = np.uint8)
h, w, c = cv2_image.shape
has_encoded, cv2_encoded = cv2.imencode('.jpg', cv2_image, [int(cv2.IMWRITE_JPEG_QUALITY), 50])
cv2_encoded_bytes = base64.b64encode(cv2_encoded)

def infer(stub, model_name, model_input):
    return stub.Predictions(
        inference_pb2.PredictionsRequest(
            model_name = model_name,
            input = {'data': model_input}
        )
    )

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:7070')
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)

    times = []
    for _ in range(100):
        start = time()
        response = requests.post("http://localhost:8080/predictions/face-recognition/", data = cv2_encoded_bytes)
        # response = infer(stub, 'face-recognition', cv2_encoded_bytes)
        took = time() - start
        times.append(took)
        print(f'Took \t: {took}')

    times = np.array(times)
    print(f'Mean time \t: {np.mean(times)}')
    print(f'Median time \t: {np.median(times)}')

Outputs with http when running from host:

Mean time       : 0.025488979816436767
Median time     : 0.023711800575256348

Outputs with grpc when running from host:

Mean time       : 0.07445537805557251
Median time     : 0.07411670684814453

Outputs with http when running within the container:

Mean time       : 0.02136315107345581
Median time     : 0.02019190788269043

Outputs with grpc when running within the container:

Mean time       : 0.019478685855865478
Median time     : 0.01735556125640869

Any way I can get around this latency?

Thanks for your time.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
lxningcommented, Jan 27, 2022

@braindotai pls feel free to reopen this ticket if there are any further issues.

0reactions
lxningcommented, Jan 21, 2022

@braindotai Thank you for the update. The docker image is built on ubuntu. You can try building a docker image on windows to see if it can help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Load-balancing a gRPC service using Docker - Anvil
Learn how to set up a gRPC microservice in your local development environment with efficient load-balancing.
Read more >
Why is gRPC so much slower than an HTTP API sending an ...
The main goal is to prove that gRPC is faster than an HTTP call because the use of HTTP/2, the use of protocol...
Read more >
Docker in Docker 19.03 service fails (#4501) · Issues - GitLab
CE and EE jobs are are failing with an error like: docker: Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker...
Read more >
Learnings from gRPC on AWS - Rokt
With ALB, a client sends request to the ALB via HTTP/1.1 and then the ALB forwards the request to the backend targets to...
Read more >
v3.4.0 (TBD 2019)
Rewrite client balancer with new gRPC balancer interface. ... Use docker pull gcr.io/etcd-development/etcd:v3.4.x instead, with the exact patch version.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found