Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server with python backend slow for YOLO inferencing

See original GitHub issue

Objective:

Running YOLOv5 with triton server for performing inference. Input source is a real time video stream through RTSP URL

Setup: Followed the below template to run my own custom code: https://github.com/triton-inference-server/python_backend/tree/main/examples/add_sub

Server Image: nvcr.io/nvidia/tritonserver:22.08-pyt-python-py3

Code Change:

Changes to examples/custom_yolo/model.py

Read input as per the add_sub example --> ( for request in requests: .....)
Perform YOLO predictions
Return Response

Running the model.py cd python_backend mkdir -p models/custom_yolo/1/ cp examples/custom_yolo/model.py models/custom_yolo/1/model.py cp examples/custom_yolo/config.pbtxt models/custom_yolo/config.pbtxt tritonserver --model-repository pwd/models

Client Image: nvcr.io/nvidia/tritonserver:22.08-py3-sdk /bin/bash

Code Change:

Changes to client.py – Input source RTSP stream

Read frame by frame
Convert it to the triton format(similar to add_sub example)
Send to server for inference

Running client: python3 triton_inference_server_python_backend/examples/custom_yolo/client.py

Results Ran the triton server in Azure GPU VMs (Standard NC6s v3 (6 vcpus, 112 GiB memory)).

The triton server performs at a 15 FPS rate which is very slow. Without triton the result is at >= 30 FPS when only running YOLO

Question

Is the expectation of triton performing better/at par with standalone system correct?
Is something being done wrong or is missed?
What can be the remedial steps?

My final inferencing pipeline is :

Read live streams coming from N cameras (N >= 5)
Perform inferencing by a series of different models (Model 1 (YOLO) --> Model 2(Custom Model) --> Model 3(Custom Model) --> Save results to cloud)
For the above is triton a correct selection?

Issue Analytics

State:
Created a year ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

rahul1728jhacommented, Oct 18, 2022

@tanmayv25 Thanks a lot for your help. Will go through that

0reactions

tanmayv25commented, Oct 17, 2022

Unfortunately, I too don’t have any hands-on experience with Nvidia DeepStream. From their documentation, it definitely looks like they support most of the use-cases. For Triton plugin within deep stream, they say tensorflow and pytorch backends are supported. So, I am not so sure whether custom triton backends would be supported. Python backend suffers from extra data copies which will have adverse affect on performance. You can write your custom logic in C++ backend to derive more performance. See example backends here: https://github.com/triton-inference-server/backend/tree/main/examples

Looks like there are lots of webinars and technical blogpost here: https://developer.nvidia.com/deepstream-getting-started#introduction

Top Results From Across the Web

Triton inference time extremely slow at scale #4142 - GitHub

The issue is that the beam pipeline has extremely high concurrency, so there are a lot of images being sent concurrently to the...

Yolov3 with tensorrt-inference-server | by 楊亮魯 - Medium

the yolo_client is 1.5 times slower than python trt, which might because of the overhead of communication or queuing mechanism(using async or ...

Serving a Torch-TensorRT model with Triton - PyTorch

Let's discuss step-by-step, the process of optimizing a model with Torch-TensorRT, deploying it on Triton Inference Server, and building a client to query...

Newest 'inference' Questions - Page 2 - Stack Overflow

I am using Triton Inference Server with python backend, at moment send single grpc request does anybody know how we can use the...

Triton Inference Server Release 21.09

The TensorRT backend is now an optional part of Triton just like all the other backends. The compose utility can be used to...