question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gRPC communication extremely slow

See original GitHub issue

Description

I am following the Triton image classification example (https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/client_example.html#section-image-classification-example) but when i use gRPC protocol, it is extremely slow

Triton Information What version of Triton are you using?

20.06

Are you using the Triton container or did you build it yourself?

The pre-built Triton container

To Reproduce Steps to reproduce the behavior.

  1. run the triton server using the following simple script:
#!/bin/sh

home_path="/home/donnie/Documents/Repos/triton-inference-server"
docker_img="nvcr.io/nvidia/tritonserver:20.06-py3"
#docker_img="tritonserver:latest"

docker run --rm --shm-size=1g --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        -p 8000:8000 -p 8001:8001 -p 8002:8002 \
        -v $home_path/docs/examples/model_repository:/models \
        $docker_img tritonserver --model-repository=/models
  1. run client container interactively: docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.06-py3-clientsdk

  2. Inside the client container, run the following code:

python image_client.py -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

This takes almost like a minute compared to REST api, which took like 1 sec or less.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

I am using an example model configuration from triton provided model repo

Expected behavior

Between REST api and gRPC , there should be no difference in communication time

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
DonnieKim411commented, Jul 30, 2020

HA… I am experiencing the exact same problem described in that thread.

https://github.com/grpc/grpc/issues/22260#issuecomment-596040814

by the way, I also noticed that the python side gets slower and slower as the loop goes. It first took 2s to finish each iter, and after several tens of iters, it took about 6s, and after more iters, the latency can be as much as 30s.

By observing the output of command htop, I saw that one cpu core works at 100% during the transmission. Is this my configuration problem or some bug of python-grpc ?

When I request the model server concurrently like 100 times, the latency gets larger and larger, so in the beginning it might take a couple of seconds, but later calls can get take like 30 seconds. I also saw one cpue core working at 100% during the communication…

So it sounds like now this is a problem of gRPC client side python package, not triton…

Should I close this thread then (since it is beyond the scope of triton package) ?

Also, do you guys have like s slack channel or some other means of communication? Sometimes I just wanna ask about APIs and github issue page is not suitable for that purpose.

1reaction
deadeyegoodwincommented, Jul 23, 2020

I think it is something related to your system. Notice how your user times are similar to what we see. It seems that for some reason your process isn’t doing anything on the CPU but is taking alot of wall-clock time to finish.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance Best Practices - gRPC
A user guide of both general and language-specific best practices to improve performance.
Read more >
Slow gRPC communication with large file in Python
I'm trying to do a large array transfer (10-50MB) over gRPC in python and it's quite slow (5-10 seconds, both client and server...
Read more >
Performance best practices with gRPC - Microsoft Learn
gRPC is designed for high-performance services. This document explains how to get the best performance possible from gRPC.
Read more >
Why is gRPC so much slower than an HTTP API sending an ...
The main goal is to prove that gRPC is faster than an HTTP call because the use of HTTP/2, the use of protocol...
Read more >
Improving First Input Delay by Leveraging gRPC - Medium
gRPC is designed for low latency and high throughput communication, which makes our service the perfect candidate to benefit from gRPC.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found