Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gRPC communication extremely slow

See original GitHub issue

Description

I am following the Triton image classification example (https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/client_example.html#section-image-classification-example) but when i use gRPC protocol, it is extremely slow

Triton Information What version of Triton are you using?

20.06

Are you using the Triton container or did you build it yourself?

The pre-built Triton container

To Reproduce Steps to reproduce the behavior.

run the triton server using the following simple script:

#!/bin/sh

home_path="/home/donnie/Documents/Repos/triton-inference-server"
docker_img="nvcr.io/nvidia/tritonserver:20.06-py3"
#docker_img="tritonserver:latest"

docker run --rm --shm-size=1g --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        -p 8000:8000 -p 8001:8001 -p 8002:8002 \
        -v $home_path/docs/examples/model_repository:/models \
        $docker_img tritonserver --model-repository=/models

run client container interactively: docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.06-py3-clientsdk
Inside the client container, run the following code:

python image_client.py -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

This takes almost like a minute compared to REST api, which took like 1 sec or less.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

I am using an example model configuration from triton provided model repo

Expected behavior

Between REST api and gRPC , there should be no difference in communication time

Issue Analytics

State:
Created 3 years ago
Comments:13 (8 by maintainers)

Top GitHub Comments

1reaction

DonnieKim411commented, Jul 30, 2020

HA… I am experiencing the exact same problem described in that thread.

https://github.com/grpc/grpc/issues/22260#issuecomment-596040814

by the way, I also noticed that the python side gets slower and slower as the loop goes. It first took 2s to finish each iter, and after several tens of iters, it took about 6s, and after more iters, the latency can be as much as 30s.

By observing the output of command htop, I saw that one cpu core works at 100% during the transmission. Is this my configuration problem or some bug of python-grpc ?

When I request the model server concurrently like 100 times, the latency gets larger and larger, so in the beginning it might take a couple of seconds, but later calls can get take like 30 seconds. I also saw one cpue core working at 100% during the communication…

So it sounds like now this is a problem of gRPC client side python package, not triton…

Should I close this thread then (since it is beyond the scope of triton package) ?

Also, do you guys have like s slack channel or some other means of communication? Sometimes I just wanna ask about APIs and github issue page is not suitable for that purpose.

1reaction

deadeyegoodwincommented, Jul 23, 2020

I think it is something related to your system. Notice how your user times are similar to what we see. It seems that for some reason your process isn’t doing anything on the CPU but is taking alot of wall-clock time to finish.