Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton Server Docker Image with ONNXRuntime support

See original GitHub issue

Is your feature request related to a problem? Please describe.

Currently, we are using the full tritonserver docker image (i.e. xx.yy-py3) in order to use our ONNXRuntime models. However, the image size is too large for us and we would like to decrease it by having a tritonserver docker image that only supports ONNXRuntime models, similar to how there is xx.yy-tf2-python-py3 for TensorFlow 2.x and xx.yy-pyt-python-py3 for PyTorch.

One constraint we have is that we do not want to manually manage the tritonserver image updates by cloning the git repository and using the compose.py file in the near future. That is why we prefer to have an official image from Nvidia’s Container Registry.

Describe the solution you’d like

An officially supported tritonserver docker image with ONNXRuntime and Python backends only.

Describe alternatives you’ve considered

A python tritonserver wheel with ONNXRuntime support that can be installed via pip so that we can use the xx.yy-py3-min image in a Dockerfile to build our own custom image.
A tritonserver shell script that will allow us to build tritonserver with ONNXRuntime support without the need to clone the git repository so that we can use the xx.yy-py3-min image in a Dockerfile to build our own custom image.

Additional context

None

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (4 by maintainers)

Top GitHub Comments

3reactions

yaysummeriscomingcommented, Jun 29, 2022

+1 to this request. Understand that there’s a lot of combinations to look after, but I think ORT is far more important than say TF2. Triton is not a tool for beginners, and who uses TF2 these days?

Maintaining & hosting a separate Triton build is some work, whereas on your side it’s just another configuration.

1reaction

jbkyang-nvicommented, Jun 29, 2022

@onurcayci

A tritonserver shell script that will allow us to build tritonserver with ONNXRuntime support without the need to clone the git repository so that we can use the xx.yy-py3-min image in a Dockerfile to build our own custom image.

What is the problem of running compose.py? Is it missing functionalities? It seems like you only need to run

git clone --single-branch --depth=1 -b <version number such as r22.05> https://github.com/triton-inference-server/server.git
python3 compose.py --backend onnxruntime --backend python

Is the image taking to much space?

@yaysummeriscoming As you noted, we could have some explosion in maintaining different combinations not to mention all the testing we have to do for each specific image we maintain. Setting up the framework to such support will take some time, or a lot of customer interest to build one specific request. In the meantime, if there is commonality in usage of the python+onnxruntime backends, you and @onurcayci and other members of the community can share your builds on https://www.docker.com/products/docker-hub/.

Top Results From Across the Web

Triton Inference Server Release 22.05

The Triton Inference Server container image, release 22.05, is available on NGC and is open source on GitHub.

High-performance model serving with Triton (preview)

It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more.

Serve multiple models with Amazon SageMaker and Triton ...

SageMaker invokes the hosting service by running a Docker container. The Docker container launches a RESTful inference server (such as Flask) to ...

Deploy a model with #nvidia #triton inference server ...

Deploy a model with #nvidia # triton inference server, #azurevm and # onnxruntime.

ONNX Runtime for Azure ML by Microsoft

Container images for ONNX Runtime with different HW execution providers. ... The ONNX Runtime inference engine supports Python, C/C++, C#, Node.js and Java ......