Triton server with python backend slow for YOLO inferencing
See original GitHub issueObjective:
Running YOLOv5 with triton server for performing inference. Input source is a real time video stream through RTSP URL
Setup: Followed the below template to run my own custom code: https://github.com/triton-inference-server/python_backend/tree/main/examples/add_sub
Server Image: nvcr.io/nvidia/tritonserver:22.08-pyt-python-py3
Code Change:
Changes to examples/custom_yolo/model.py
- Read input as per the add_sub example --> (
for request in requests: .....
) - Perform YOLO predictions
- Return Response
Running the model.py
cd python_backend
mkdir -p models/custom_yolo/1/
cp examples/custom_yolo/model.py models/custom_yolo/1/model.py
cp examples/custom_yolo/config.pbtxt models/custom_yolo/config.pbtxt
tritonserver --model-repository
pwd/models
Client Image: nvcr.io/nvidia/tritonserver:22.08-py3-sdk /bin/bash
Code Change:
Changes to client.py – Input source RTSP stream
- Read frame by frame
- Convert it to the triton format(similar to add_sub example)
- Send to server for inference
Running client: python3 triton_inference_server_python_backend/examples/custom_yolo/client.py
Results Ran the triton server in Azure GPU VMs (Standard NC6s v3 (6 vcpus, 112 GiB memory)).
The triton server performs at a 15 FPS rate which is very slow. Without triton the result is at >= 30 FPS when only running YOLO
Question
- Is the expectation of triton performing better/at par with standalone system correct?
- Is something being done wrong or is missed?
- What can be the remedial steps?
My final inferencing pipeline is :
- Read live streams coming from N cameras (N >= 5)
- Perform inferencing by a series of different models (Model 1 (YOLO) --> Model 2(Custom Model) --> Model 3(Custom Model) --> Save results to cloud)
- For the above is triton a correct selection?
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
@tanmayv25 Thanks a lot for your help. Will go through that
Unfortunately, I too don’t have any hands-on experience with Nvidia DeepStream. From their documentation, it definitely looks like they support most of the use-cases. For Triton plugin within deep stream, they say tensorflow and pytorch backends are supported. So, I am not so sure whether custom triton backends would be supported. Python backend suffers from extra data copies which will have adverse affect on performance. You can write your custom logic in C++ backend to derive more performance. See example backends here: https://github.com/triton-inference-server/backend/tree/main/examples
Looks like there are lots of webinars and technical blogpost here: https://developer.nvidia.com/deepstream-getting-started#introduction