Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

My tensorrt model can not be loaded by triton server

See original GitHub issue

error info

E1217 06:50:10.976331 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 0)
E1217 06:50:10.976343 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)

I got my trt model by converting onnx model outside the container. And my trt version is 8.0.3.4

Also I could run my trt model outside the container.

My triton docker image version: 21.11-py3

What should I do to solve it?

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

3reactions

prz30commented, Dec 28, 2021

I meet same problem. It Looks like a TRT version mismatch. I solving this problem by use trtexec command in triton docker. trtexec path in triton docker is /usr/src/bin/trtexec

2reactions

Tabriziancommented, Dec 17, 2021

Looks like a TRT version mismatch. The TRT version that you have used seem to be correct according to the DL Framework Support Matrix. Have you generated the TRT file inside the TensorRT 21.11 container?

Top Results From Across the Web

Serving TensorRT Models with NVIDIA Triton Inference Server

In real-time AI model deployment en masse, efficiency of model inference and hardware/GPU usage is paramount.

NVIDIA Triton Inference Server Container Versions

If you encounter accuracy issues with your TensorRT model, you can work-around the issue byenabling the output_copy_stream option in your ...

Serving a Torch-TensorRT model with Triton - PyTorch

Let's discuss step-by-step, the process of optimizing a model with Torch-TensorRT, deploying it on Triton Inference Server, and building a client to query...

Serve multiple models with Amazon SageMaker and Triton ...

You can use NVIDIA Triton Inference Server to serve models for ... You can generate your own TensorRT engines according to your needs....

Serving Predictions with NVIDIA Triton | Vertex AI

NVIDIA Triton inference server (Triton) is an open-source inference serving ... Specifically, TensorRT, TensorFlow SavedModel, and ONNX models do not ...