question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance on triton with python backend

See original GitHub issue

Description Hello

  • I was expecting that triton would be better than torchserve when I read some documentation. But in the images bellow, i see that: with the same model, the throughput of triton is lower the throughput of torchserve. Despite of I used tensorrt model in triton (the throughput of tensorrt model inference are faster than pytorch model inference)

Screenshot from 2021-08-25 18-29-01

  • Could it be that was wrong at some point ?

Triton Information

Version triton 21.07

To Reproduce

This is what i did:

  • In model repository folder, with using python backend: I add a model in model repos (creating model.py file, adding the weight of the tensorrt trained model here, creating config.pbtxt file, build stub and conda-pack my envs, …)
  • In model.py at TritonPythonModel class:
    • I load pretrained model at initialize function
    • In excuse funcion, I implement my model inference and convert to format for triton response.

Expected behavior

Am I doing it right?. If I’m wrong can you point me in the right direction.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Tabriziancommented, Aug 25, 2021

@gioipv Could you please share your model.py file. From your overall description, it seems like you are doing it correctly. What inferencing solution does “model inference” column use? Is it using the PyTorch backend?

How did you measure the performance of your Python models? Did you use Perf Analyzer?

0reactions
Tabriziancommented, Aug 30, 2021

@gioipv Thanks for sharing the results.

What do you think about that: (triton’s client API can take a long time)

Triton Python Client may add some latency because of using Python API which could be slower. Also, if you want to try concurrency values higher than 1 it would be harder to create the same scenario using Python client.

Read more comments on GitHub >

github_iconTop Results From Across the Web

triton python backend load time of pytorch model is 4x slower ...
When I deploy the mdoel using python backend, the loading time is around 0.8 seconds. However, if I load the onnx model without...
Read more >
1. Introduction — Poplar Triton Backend - Graphcore Documents
1.7.1. Triton performance analyzer and metrics ... For each batch of requests the Poplar backend will provide compute and execution times...
Read more >
Solving AI Inference Challenges with NVIDIA Triton
Using the Python or C++ backends, you can define a custom script that can call any other model being served by Triton based...
Read more >
Use Triton Inference Server with Amazon SageMaker
The Triton Python backend uses shared memory (SHMEM) to connect your code to Triton. SageMaker Inference provides up to half of the instance...
Read more >
High-performance serving with Triton Inference Server (Preview)
Triton is multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found