Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gRPC Client ~4x slower than http requests in docker

See original GitHub issue

Im running the containerized torchserve to deploy a simple face recognition model. I have a 20mb torchscript model, that takes an input image of size 3x72x53 and returns K dimensional features as the embeddings. Now by simply using the requests library in python, my inference runtime is 0.02 seconds (which is very good since its very close to running the same torchscript model locally), but while running the same model with gRPC client as described in the docs its taking > 0.07 seconds.

Now out of curiosity, I ran the same client script inside the docker container for both http and grpc, and I found that grpc is equally faster if not more. This means that the issue has something to do with the docker port forwarding I suppose.

Here’s my config.properties:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
load_models=all
install_py_dep_per_model=true # since I have cv2 dependency

This is my handler:

import torch
import base64
from ts.torch_handler.base_handler import BaseHandler
from torchvision import transforms
from PIL import Image
import cv2
import numpy as np

class ModelHandler(BaseHandler):
    def initialize(self, context):
        super().initialize(context)

        self.transform = transforms.Compose([
            transforms.Resize((72, 54)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ])

    def preprocess(self, data):
        images = []
        for instance in data:
            try:
                cv2_encoded_bytes = instance.get("data") or instance.get("body")
            except:
                cv2_encoded_bytes = instance

            if isinstance(cv2_encoded_bytes, (bytearray, bytes)):
                cv2_decoded_bytes = base64.b64decode(cv2_encoded_bytes)
                cv2_decoded = np.frombuffer(cv2_decoded_bytes, dtype = np.uint8)
                cv2_image = cv2.imdecode(cv2_decoded, 1)

                # Im sending cv2.imcode as the model inputs to have fewer network bottlenecks
                # hence using cv2.imdecode to get back the image

                image = Image.fromarray(cv2_image)
                image = self.transform(image)
            else:
                image = torch.FloatTensor(image)

            images.append(image)

        return torch.stack(images).to(self.device)

    def postprocess(self, inference_output): # same as BaseHandler
        return inference_output.tolist()

And below is my client file that has the logic of calling the model from http and grpc:

import cv2
import base64
import numpy as np
import requests
from time import time

import grpc
import inference_pb2
import inference_pb2_grpc
import management_pb2_grpc

cv2_image = np.ones((512, 512, 3), dtype = np.uint8)
h, w, c = cv2_image.shape
has_encoded, cv2_encoded = cv2.imencode('.jpg', cv2_image, [int(cv2.IMWRITE_JPEG_QUALITY), 50])
cv2_encoded_bytes = base64.b64encode(cv2_encoded)

def infer(stub, model_name, model_input):
    return stub.Predictions(
        inference_pb2.PredictionsRequest(
            model_name = model_name,
            input = {'data': model_input}
        )
    )

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:7070')
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)

    times = []
    for _ in range(100):
        start = time()
        response = requests.post("http://localhost:8080/predictions/face-recognition/", data = cv2_encoded_bytes)
        # response = infer(stub, 'face-recognition', cv2_encoded_bytes)
        took = time() - start
        times.append(took)
        print(f'Took \t: {took}')

    times = np.array(times)
    print(f'Mean time \t: {np.mean(times)}')
    print(f'Median time \t: {np.median(times)}')

Outputs with http when running from host:

Mean time       : 0.025488979816436767
Median time     : 0.023711800575256348

Outputs with grpc when running from host:

Mean time       : 0.07445537805557251
Median time     : 0.07411670684814453

Outputs with http when running within the container:

Mean time       : 0.02136315107345581
Median time     : 0.02019190788269043

Outputs with grpc when running within the container:

Mean time       : 0.019478685855865478
Median time     : 0.01735556125640869

Any way I can get around this latency?

Thanks for your time.

Issue Analytics

State:
Created 2 years ago
Comments:7

Top GitHub Comments

1reaction

lxningcommented, Jan 27, 2022

@braindotai pls feel free to reopen this ticket if there are any further issues.

0reactions

lxningcommented, Jan 21, 2022

@braindotai Thank you for the update. The docker image is built on ubuntu. You can try building a docker image on windows to see if it can help.

Top Results From Across the Web

Load-balancing a gRPC service using Docker - Anvil

Learn how to set up a gRPC microservice in your local development environment with efficient load-balancing.

Why is gRPC so much slower than an HTTP API sending an ...

The main goal is to prove that gRPC is faster than an HTTP call because the use of HTTP/2, the use of protocol...

Docker in Docker 19.03 service fails (#4501) · Issues - GitLab

CE and EE jobs are are failing with an error like: docker: Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker...

Learnings from gRPC on AWS - Rokt

With ALB, a client sends request to the ALB via HTTP/1.1 and then the ALB forwards the request to the backend targets to...

v3.4.0 (TBD 2019)

Rewrite client balancer with new gRPC balancer interface. ... Use docker pull gcr.io/etcd-development/etcd:v3.4.x instead, with the exact patch version.