Cannot load Custom Op file in the container LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
See original GitHub issueDescription
I am in the development phase of running Deep Learning Model on Triton Inference Server. I am using the LD_PRELOAD trick to load customs ops needed to support inference. But the libraries do not load up correct, and give the follwing error in the container logs
priyankasaraf@priyank-ltmatct script % kubectl logs triton-54c965dcd9-tqkjx -c triton-server
ERROR: ld.so: object '/triton/lib/_sentencepiece_tokenizer.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_normalize_ops.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_regex_split_ops.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_wordpiece_tokenizer.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Expected behavior: is a a successful inference.
But the current response is :
"Op type not registered 'CaseFoldUTF8' in binary running on triton-54c965dcd9-tqkjx. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.)
tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed."
Environment TensorRT Version: GPU Type: GPU 0: Tesla V100-SXM2-16GB
TRITON_SERVER_VERSION=2.15.0 NVIDIA_TRITON_SERVER_VERSION=21.10 NSIGHT_SYSTEMS_VERSION=2021.3.2.4 Triton Image: “21.10” CUDA Version: CUDA_VERSION=11.4.3.001 CUDA_DRIVER_VERSION=470.57.02 CUDNN Version: Operating System + Version: Distributor ID: Ubuntu, Description: Ubuntu 20.04.3 LTS Release: 20.04, Codename: focal Python Version (if applicable): python3.8
Are you using the Triton container or did you build it yourself? Using image 21.10
To Reproduce A deployment is done with one of the containers to run the triton-inference-server with the following agruments (triton-server container’s yaml file below.)
containers:
- name: triton-server
image: "21.10"
command: ["/bin/bash"]
# About "backend-config": All backends are initialized; pytorch, tensorflow, openvino & onnxruntime.
# We are overriding Tensorflow version to be loaded by default to 2 (Rest of them will still load)
# --backend-config=tensorflow,version=2
# Ref: https://github.com/triton-inference-server/tensorflow_backend/blob/40f9d94ca1243de004c609cf9b056de19462d545/README.md
args: ["-c",
"export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH
&& export LD_PRELOAD="'/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so
/triton/lib/_regex_split_ops.so /triton/lib/_wordpiece_tokenizer.so'"
&& tritonserver
--model-repository=/models/triton
--backend-config=tensorflow,version=2
--log-verbose=5
--log-info=true
--log-warning=true
--log-error=true
--http-port=8000
--grpc-port=8001
--metrics-port=8002
--model-control-mode=explicit
--grpc-use-ssl=false"
]
volumeMounts:
- mountPath: /models/triton
name: models
- mountPath: /triton/lib
name: libraries
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /v2/health/live
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
ports:
- containerPort: 8000
name: http
protocol: TCP
- containerPort: 8001
name: grpc
protocol: TCP
- containerPort: 8002
name: http-metrics
protocol: TCP
readinessProbe:
successThreshold: 1
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
httpGet:
path: /v2/health/live
port: http
scheme: HTTP
resources:
requests:
cpu: 2
memory: 12G
limits:
cpu: 3
memory: 24G
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). config.pbtxt:
name: "c8d9316a-1cfb-4b4b-aea3-3659e9dc5a17"
platform: "tensorflow_savedmodel"
input {
name: "input_1"
data_type: TYPE_STRING
dims: [-1, 1]
}
output {
name: "model_exporter"
data_type: TYPE_FP32
dims: [-1, 768]
}
instance_group {
count: 1
}
Metadata.json
{
"inputs": [
{
"datatype": "BYTES",
"name": "input_1",
"shape": [
-1,
1
]
}
],
"model_id": "SearchQnA",
"model_version": "1",
"outputs": [
{
"datatype": "FP32",
"name": "model_exporter",
"shape": [
-1,
768
]
}
],
"platform": "TENSORFLOW"
}
Issue Analytics
- State:
- Created a year ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
Thanks. I was able to find the root cause of the issue. I have two containers, the first one downloads the necessary binaries to a shared volume mount. The second container runs the triton-inference-server, which ideally should use those binaries from the shared volume mount. But since the containers spin-up in parallel, the triton-server container tries to load those binaries before the first container can download them. Due to this the second container throws an LD_PRELOAD error, but starts the triton-server anyway without the binaries.
Got it, that makes sense. Thanks for investigating and updating us!