Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot load model from GCS when LD_PRELOAD env var is set

See original GitHub issue

Description I am trying to deploy the universal sentence encoder using the method described in this issue comment, which requires the LD_PRELOAD trick. When the environment variable LD_PRELOAD is set, triton fails to load the model from the GCS bucket:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

W0602 16:34:04.094388 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0602 16:34:04.094496 1 cuda_memory_manager.cc:115] CUDA memory pool disabled

I0602 16:49:09.045304 1 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.22.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | gs://my_bucket/model_repository                                                                                                                                                          |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 0                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0602 16:49:09.045362 1 server.cc:254] No server context available. Exiting immediately.
error: creating server: Internal - Could not get MetaData for bucket with name my_bucket: Retry policy exhausted in GetBucketMetadata: EasyPerform() - CURL error [35]=SSL connect error [UNAVAILABLE]

If I simply unset the LD_PRELOAD environment variable, then triton successfully loads the model from the GCS bucket, but inference requests yield Op type not registered 'SentencepieceOp' in binary... as expected and described in the issue for the above issue comment. Furthermore, if I copy the model repository to my local filesystem and set the LD_PRELOAD env var then I can load the model and successfully serve requests.

Triton Information What version of Triton are you using? 22.05

Are you using the Triton container or did you build it yourself? Basing off the official image. My Dockerfile:

# Base off the nvidia base image
ARG TRITON_VERSION=22.05-py3
FROM nvcr.io/nvidia/tritonserver:${TRITON_VERSION}

# Install tf text package
ARG TF_TEXT_VERSION=2.8.*
RUN pip install tensorflow-text==${TF_TEXT_VERSION}

# Update the LD_LIBRARY_PATH environment variable
ARG ADD_LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2
ENV LD_LIBRARY_PATH=${ADD_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Download the model:

mkdir -p model_repository/muse/1/model.savedmodel
wget https://tfhub.dev/google/universal-sentence-encoder-multilingual/3?tf-hub-format=compressed -O model_repository/muse/1/model.savedmodel/data.tar.gz
tar -xvf model_repository/muse/1/model.savedmodel/data.tar.gz - model_repository/muse/1/model.savedmodel/

Add the config to model_repository/muse/config.pbtxt:

name: "muse"
platform: "tensorflow_savedmodel"
max_batch_size: 0
input [
    {
        name: "inputs"
        data_type: TYPE_STRING
        dims: [-1]
    }
]
output [
    {
        name: "outputs"
        data_type: TYPE_FP32
        dims: [-1, 512]
    }
]

Copy the model to a GCS bucket:

gsutil cp -r model_repository/muse/ <BUCKET_NAME>

Run Triton:

docker run -t -p 8000:8000 --rm  \
  -v $(pwd):/workspace \
  --name=tritonserver \
  -e AIP_MODE=True \
  -e LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so \
  -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/gcs_creds.json \
  <IMAGE_NAME> \
  --model-repository gs://my_bucket/model_repository \
  --strict-model-config=false \
  --log-verbose=1 \
  --backend-config=tensorflow,version=2

In my case, I need my google credentials in the current working directory at gcs_creds.json.

Expected behavior Should be able to load the universal sentence encoder from a GCS bucket and correctly serve requests

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

krishung5commented, Jul 5, 2022

@lross68 Glad you’re unblocked! I’m closing this issue for now. We will update once we found a better and permanent solution for this.

0reactions

lross68commented, Jul 5, 2022

Thanks @krishung5, this workaround does appear to do the trick - I am able to load the universal sentence encoder model from a GCS bucket and run inference successfully. I look forward to hearing what more permanent solutions are available, but this workaround should hold us over!

Top Results From Across the Web

lib specified by LD_PRELOAD can not be loaded

I wonder that why it prompts that libtest.so cannot be found After I exported LD_PRELOAD variable. However, I also tried to use LD_PRELOAD...

Linux® Application Tuning Guide for SGI® X86-64 Based ...

You must also set the LD_LIBRARY_PATH environment variable to the directory where the library is stored before running the executable. C/C++ Libraries.

Java Platform, Standard Edition - Troubleshooting Guide

Set up the Java environment to debug: Consider the following scenarios while setting ... Force a core file: If the application can't be...

Profiling with Nsight Systems

Effect: Set environment variable TEST_ONLY=0. Launch the application using the given arguments. Start collecting after 20 seconds and end ...

Setting environment variables for UNIX environments

The 64-bit programs cannot load 32-bit libraries (and vice versa), so if you use the LD_PRELOAD environment variables to preload the ...