gRPC communication extremely slow
See original GitHub issueDescription
I am following the Triton image classification example (https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide/docs/client_example.html#section-image-classification-example) but when i use gRPC protocol, it is extremely slow
Triton Information What version of Triton are you using?
20.06
Are you using the Triton container or did you build it yourself?
The pre-built Triton container
To Reproduce Steps to reproduce the behavior.
- run the triton server using the following simple script:
#!/bin/sh
home_path="/home/donnie/Documents/Repos/triton-inference-server"
docker_img="nvcr.io/nvidia/tritonserver:20.06-py3"
#docker_img="tritonserver:latest"
docker run --rm --shm-size=1g --ulimit memlock=-1 \
--ulimit stack=67108864 \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $home_path/docs/examples/model_repository:/models \
$docker_img tritonserver --model-repository=/models
-
run client container interactively:
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.06-py3-clientsdk
-
Inside the client container, run the following code:
python image_client.py -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
This takes almost like a minute compared to REST api, which took like 1 sec or less.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
I am using an example model configuration from triton provided model repo
Expected behavior
Between REST api and gRPC , there should be no difference in communication time
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (8 by maintainers)
Top GitHub Comments
HA… I am experiencing the exact same problem described in that thread.
https://github.com/grpc/grpc/issues/22260#issuecomment-596040814
When I request the model server concurrently like 100 times, the latency gets larger and larger, so in the beginning it might take a couple of seconds, but later calls can get take like 30 seconds. I also saw one cpue core working at 100% during the communication…
So it sounds like now this is a problem of gRPC client side python package, not triton…
Should I close this thread then (since it is beyond the scope of triton package) ?
Also, do you guys have like s slack channel or some other means of communication? Sometimes I just wanna ask about APIs and github issue page is not suitable for that purpose.
I think it is something related to your system. Notice how your user times are similar to what we see. It seems that for some reason your process isn’t doing anything on the CPU but is taking alot of wall-clock time to finish.