21.12-py3 Server launching error when hosting a TRT model with custom plugin
See original GitHub issueDescription We want touse Triton server to host models with custom plugins built in the TensorRT 21.12-py3 docker environment, and we get different types of errors.
We create an minimum example reproducing the issue, for this case the error is
E0105 05:32:15.146171 1 logging.cc:43] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (initialization error)
Triton Information
What version of Triton are you using?
21.12-py3
Are you using the Triton container or did you build it yourself?
we use the container directly docker pull nvcr.io/nvidia/tritonserver:21.12-py3
To Reproduce
Steps to reproduce the behavior.
We create an minimum example reproducing the issue
https://github.com/zmy1116/triton_server_custom_plugin_issue_21_12
The model consists of 1 custom plugin layer using https://github.com/NVIDIA/TensorRT/tree/main/samples/python/uff_custom_plugin
It takes an input of size 1x10
and clip the values between 0.0 to 0.5.
We use TRT 21.12 environment to build the model engine.
docker pull nvcr.io/nvidia/tensorrt:21.12-py3
And we then host it directly using the triton server container
The full procedure is described in the attached repository, to summarize
-
In TRT docker environment
docker run --gpus all -it -p8889:8889 --rm -v /home/ubuntu:/workspace/ubuntu nvcr.io/nvidia/tensorrt:21.12-py3
-
Build the plugin
git clone https://github.com/zmy1116/triton_server_custom_plugin_issue_21_12
cd triton_server_custom_plugin_issue_21_12/custom_plugin
mkdir build
cd build
cmake ..
make
cd ../../
- create TRT model engine
create_engine.py
- organize the produced engine and plugin in a model repository folder and launch triton server
docker run --gpus=all --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /home/ubuntu/dummy_repository:/models -eLD_PRELOAD=/models/dummy/libclipplugin.so nvcr.io/nvidia/tritonserver:21.12-py3 tritonserver --model-repository=/models --strict-model-config=false
And you shall see a full screen of the following errors
E0105 05:32:15.146037 1 logging.cc:43] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (initialization error)
Expected behavior I would expect the server to be launched properly. This is a minimum example I take from the TensorRT examples directly.
For one of our own plugin, we actually see a different error
[Torch-TensorRT] - Unable to read CUDA capable devices. Return status
Please let me know if you need additional information.
Thanks
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
We are implementing an improved method to load shared libraries that implement TensorRT custom operations. @tanmayv25 please link this issue to the PR when you submit it.
@deadeyegoodwin
Yes, I’ve tested to host models that does not use custom plugins and they work.
Yes, This specific example runs in the tensorRT docker environment directly
docker pull nvcr.io/nvidia/tensorrt:21.12-py3
Yes, models can be run on GPU using pytorch/TensorFlow.