Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

21.12-py3 Server launching error when hosting a TRT model with custom plugin

See original GitHub issue

Description We want touse Triton server to host models with custom plugins built in the TensorRT 21.12-py3 docker environment, and we get different types of errors.

We create an minimum example reproducing the issue, for this case the error is E0105 05:32:15.146171 1 logging.cc:43] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (initialization error) Triton Information What version of Triton are you using? 21.12-py3 Are you using the Triton container or did you build it yourself? we use the container directly docker pull nvcr.io/nvidia/tritonserver:21.12-py3 To Reproduce Steps to reproduce the behavior. We create an minimum example reproducing the issue https://github.com/zmy1116/triton_server_custom_plugin_issue_21_12

The model consists of 1 custom plugin layer using https://github.com/NVIDIA/TensorRT/tree/main/samples/python/uff_custom_plugin It takes an input of size 1x10 and clip the values between 0.0 to 0.5.

We use TRT 21.12 environment to build the model engine. docker pull nvcr.io/nvidia/tensorrt:21.12-py3 And we then host it directly using the triton server container

The full procedure is described in the attached repository, to summarize

In TRT docker environment docker run --gpus all -it -p8889:8889 --rm -v /home/ubuntu:/workspace/ubuntu nvcr.io/nvidia/tensorrt:21.12-py3
Build the plugin

git clone https://github.com/zmy1116/triton_server_custom_plugin_issue_21_12
cd triton_server_custom_plugin_issue_21_12/custom_plugin
mkdir build
cd build 
cmake ..
make
cd ../../

create TRT model engine

create_engine.py

organize the produced engine and plugin in a model repository folder and launch triton server

docker run --gpus=all --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /home/ubuntu/dummy_repository:/models  -eLD_PRELOAD=/models/dummy/libclipplugin.so nvcr.io/nvidia/tritonserver:21.12-py3 tritonserver --model-repository=/models --strict-model-config=false

And you shall see a full screen of the following errors E0105 05:32:15.146037 1 logging.cc:43] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (initialization error)

Expected behavior I would expect the server to be launched properly. This is a minimum example I take from the TensorRT examples directly.

For one of our own plugin, we actually see a different error [Torch-TensorRT] - Unable to read CUDA capable devices. Return status

Please let me know if you need additional information.

Thanks

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, Jan 21, 2022

We are implementing an improved method to load shared libraries that implement TensorRT custom operations. @tanmayv25 please link this issue to the PR when you submit it.

0reactions

zmy1116commented, Jan 11, 2022

@deadeyegoodwin

Yes, I’ve tested to host models that does not use custom plugins and they work.
Yes, This specific example runs in the tensorRT docker environment directly docker pull nvcr.io/nvidia/tensorrt:21.12-py3
Yes, models can be run on GPU using pytorch/TensorFlow.

Top Results From Across the Web

https://codecov.io/api/gh/SickChill/SickChill/down...

... .github/workflows/release.yaml .gitignore .lgtm.yml .snyk .travis.yml CONTRIBUTORS.md ... lib3/setuptools/errors.py lib3/setuptools/extension.py ...