question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server with python backend slow for YOLO inferencing

See original GitHub issue

Objective:

Running YOLOv5 with triton server for performing inference. Input source is a real time video stream through RTSP URL

Setup: Followed the below template to run my own custom code: https://github.com/triton-inference-server/python_backend/tree/main/examples/add_sub

Server Image: nvcr.io/nvidia/tritonserver:22.08-pyt-python-py3

Code Change:

Changes to examples/custom_yolo/model.py

  • Read input as per the add_sub example --> ( for request in requests: .....)
  • Perform YOLO predictions
  • Return Response

Running the model.py cd python_backend mkdir -p models/custom_yolo/1/ cp examples/custom_yolo/model.py models/custom_yolo/1/model.py cp examples/custom_yolo/config.pbtxt models/custom_yolo/config.pbtxt tritonserver --model-repository pwd/models

Client Image: nvcr.io/nvidia/tritonserver:22.08-py3-sdk /bin/bash

Code Change:

Changes to client.py – Input source RTSP stream

  • Read frame by frame
  • Convert it to the triton format(similar to add_sub example)
  • Send to server for inference

Running client: python3 triton_inference_server_python_backend/examples/custom_yolo/client.py

Results Ran the triton server in Azure GPU VMs (Standard NC6s v3 (6 vcpus, 112 GiB memory)).

The triton server performs at a 15 FPS rate which is very slow. Without triton the result is at >= 30 FPS when only running YOLO

Question

  • Is the expectation of triton performing better/at par with standalone system correct?
  • Is something being done wrong or is missed?
  • What can be the remedial steps?

My final inferencing pipeline is :

  • Read live streams coming from N cameras (N >= 5)
  • Perform inferencing by a series of different models (Model 1 (YOLO) --> Model 2(Custom Model) --> Model 3(Custom Model) --> Save results to cloud)
  • For the above is triton a correct selection?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rahul1728jhacommented, Oct 18, 2022

@tanmayv25 Thanks a lot for your help. Will go through that

0reactions
tanmayv25commented, Oct 17, 2022

Unfortunately, I too don’t have any hands-on experience with Nvidia DeepStream. From their documentation, it definitely looks like they support most of the use-cases. For Triton plugin within deep stream, they say tensorflow and pytorch backends are supported. So, I am not so sure whether custom triton backends would be supported. Python backend suffers from extra data copies which will have adverse affect on performance. You can write your custom logic in C++ backend to derive more performance. See example backends here: https://github.com/triton-inference-server/backend/tree/main/examples

Looks like there are lots of webinars and technical blogpost here: https://developer.nvidia.com/deepstream-getting-started#introduction

Read more comments on GitHub >

github_iconTop Results From Across the Web

Triton inference time extremely slow at scale #4142 - GitHub
The issue is that the beam pipeline has extremely high concurrency, so there are a lot of images being sent concurrently to the...
Read more >
Yolov3 with tensorrt-inference-server | by 楊亮魯 - Medium
the yolo_client is 1.5 times slower than python trt, which might because of the overhead of communication or queuing mechanism(using async or ...
Read more >
Serving a Torch-TensorRT model with Triton - PyTorch
Let's discuss step-by-step, the process of optimizing a model with Torch-TensorRT, deploying it on Triton Inference Server, and building a client to query...
Read more >
Newest 'inference' Questions - Page 2 - Stack Overflow
I am using Triton Inference Server with python backend, at moment send single grpc request does anybody know how we can use the...
Read more >
Triton Inference Server Release 21.09
The TensorRT backend is now an optional part of Triton just like all the other backends. The compose utility can be used to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found