question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot load Custom Op file in the container LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

See original GitHub issue

Description

I am in the development phase of running Deep Learning Model on Triton Inference Server. I am using the LD_PRELOAD trick to load customs ops needed to support inference. But the libraries do not load up correct, and give the follwing error in the container logs

 priyankasaraf@priyank-ltmatct script % kubectl logs triton-54c965dcd9-tqkjx -c triton-server   
ERROR: ld.so: object '/triton/lib/_sentencepiece_tokenizer.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_normalize_ops.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_regex_split_ops.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/triton/lib/_wordpiece_tokenizer.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

Expected behavior: is a a successful inference. But the current response is : "Op type not registered 'CaseFoldUTF8' in binary running on triton-54c965dcd9-tqkjx. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed."

Environment TensorRT Version: GPU Type: GPU 0: Tesla V100-SXM2-16GB

TRITON_SERVER_VERSION=2.15.0 NVIDIA_TRITON_SERVER_VERSION=21.10 NSIGHT_SYSTEMS_VERSION=2021.3.2.4 Triton Image: “21.10” CUDA Version: CUDA_VERSION=11.4.3.001 CUDA_DRIVER_VERSION=470.57.02 CUDNN Version: Operating System + Version: Distributor ID: Ubuntu, Description: Ubuntu 20.04.3 LTS Release: 20.04, Codename: focal Python Version (if applicable): python3.8

Are you using the Triton container or did you build it yourself? Using image 21.10

To Reproduce A deployment is done with one of the containers to run the triton-inference-server with the following agruments (triton-server container’s yaml file below.)

      containers:
        - name: triton-server
          image: "21.10"
          command: ["/bin/bash"]
          # About "backend-config": All backends are initialized; pytorch, tensorflow, openvino & onnxruntime. 
          # We are overriding Tensorflow version to be loaded by default to 2 (Rest of them will still load)
          # --backend-config=tensorflow,version=2
          # Ref: https://github.com/triton-inference-server/tensorflow_backend/blob/40f9d94ca1243de004c609cf9b056de19462d545/README.md
          args: ["-c",
                 "export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH
                 && export LD_PRELOAD="'/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so 
                  /triton/lib/_regex_split_ops.so /triton/lib/_wordpiece_tokenizer.so'"
                 && tritonserver
                 --model-repository=/models/triton
                 --backend-config=tensorflow,version=2
                 --log-verbose=5
                 --log-info=true
                 --log-warning=true
                 --log-error=true
                 --http-port=8000
                 --grpc-port=8001
                 --metrics-port=8002
                 --model-control-mode=explicit
                 --grpc-use-ssl=false"
          ]
          volumeMounts:
            - mountPath: /models/triton
              name: models
            - mountPath: /triton/lib
              name: libraries
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /v2/health/live
              port: http
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          ports:
          - containerPort: 8000
            name: http
            protocol: TCP
          - containerPort: 8001
            name: grpc
            protocol: TCP
          - containerPort: 8002
            name: http-metrics
            protocol: TCP
          readinessProbe:
            successThreshold: 1
            failureThreshold: 3
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 1
            httpGet:
              path: /v2/health/live
              port: http
              scheme: HTTP
          resources:
            requests:
              cpu: 2
              memory: 12G
            limits:
              cpu: 3
              memory: 24G

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). config.pbtxt:

name: "c8d9316a-1cfb-4b4b-aea3-3659e9dc5a17"
platform: "tensorflow_savedmodel"
input {
  name: "input_1"
  data_type: TYPE_STRING
  dims: [-1, 1]
}
output {
  name: "model_exporter"
  data_type: TYPE_FP32
  dims: [-1, 768]
}
instance_group {
  count: 1
}

Metadata.json

{
    "inputs": [
        {
            "datatype": "BYTES",
            "name": "input_1",
            "shape": [
                -1,
                1
            ]
        }
    ],
    "model_id": "SearchQnA",
    "model_version": "1",
    "outputs": [
        {
            "datatype": "FP32",
            "name": "model_exporter",
            "shape": [
                -1,
                768
            ]
        }
    ],
    "platform": "TENSORFLOW"
}

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
PRIYANKArythem3commented, Jul 19, 2022

Thanks. I was able to find the root cause of the issue. I have two containers, the first one downloads the necessary binaries to a shared volume mount. The second container runs the triton-inference-server, which ideally should use those binaries from the shared volume mount. But since the containers spin-up in parallel, the triton-server container tries to load those binaries before the first container can download them. Due to this the second container throws an LD_PRELOAD error, but starts the triton-server anyway without the binaries.

0reactions
dyastremskycommented, Jul 19, 2022

Got it, that makes sense. Thanks for investigating and updating us!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Document steps for how to compile Pytorch custom-ops library ...
Description In order to a custom-ops library for a PyTorch model that can ... cannot be preloaded (cannot open shared object file): ignored....
Read more >
LD_PRELOAD problem on OKD4 - Dynatrace Community
When a contianer tries to ld preload iboneagentproc.so get the following error: ... cannot be preloaded (cannot open shared object file): ignored. oneagent....
Read more >
Op type not registered 'CaseFoldUTF8' in binary - TensorRT
ERROR : ld.so: object '/triton/lib/_normalize_ops.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Read more >
[Q&A] [APP] Linux-on-Android project (Complete Linux Installer)
ERROR: ld.so: object '/vendor/lib/libNimsWrap.so' from LD_PRELOAD cannot be preloaded: ignored.
Read more >
Cannot run program "ls": error=2, No such file or directory ...
I was not able to solve the problem so I found a second "workaround". Workarounds 1.Use 2 containers with different configuration.2 2.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found