Cannot load model from GCS when LD_PRELOAD env var is set
See original GitHub issueDescription
I am trying to deploy the universal sentence encoder using the method described in this issue comment, which requires the LD_PRELOAD
trick. When the environment variable LD_PRELOAD
is set, triton fails to load the model from the GCS bucket:
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
W0602 16:34:04.094388 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0602 16:34:04.094496 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
I0602 16:49:09.045304 1 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | gs://my_bucket/model_repository |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0602 16:49:09.045362 1 server.cc:254] No server context available. Exiting immediately.
error: creating server: Internal - Could not get MetaData for bucket with name my_bucket: Retry policy exhausted in GetBucketMetadata: EasyPerform() - CURL error [35]=SSL connect error [UNAVAILABLE]
If I simply unset the LD_PRELOAD
environment variable, then triton successfully loads the model from the GCS bucket, but inference requests yield Op type not registered 'SentencepieceOp' in binary...
as expected and described in the issue for the above issue comment. Furthermore, if I copy the model repository to my local filesystem and set the LD_PRELOAD
env var then I can load the model and successfully serve requests.
Triton Information What version of Triton are you using? 22.05
Are you using the Triton container or did you build it yourself? Basing off the official image. My Dockerfile:
# Base off the nvidia base image
ARG TRITON_VERSION=22.05-py3
FROM nvcr.io/nvidia/tritonserver:${TRITON_VERSION}
# Install tf text package
ARG TF_TEXT_VERSION=2.8.*
RUN pip install tensorflow-text==${TF_TEXT_VERSION}
# Update the LD_LIBRARY_PATH environment variable
ARG ADD_LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2
ENV LD_LIBRARY_PATH=${ADD_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Download the model:
mkdir -p model_repository/muse/1/model.savedmodel
wget https://tfhub.dev/google/universal-sentence-encoder-multilingual/3?tf-hub-format=compressed -O model_repository/muse/1/model.savedmodel/data.tar.gz
tar -xvf model_repository/muse/1/model.savedmodel/data.tar.gz - model_repository/muse/1/model.savedmodel/
Add the config to model_repository/muse/config.pbtxt
:
name: "muse"
platform: "tensorflow_savedmodel"
max_batch_size: 0
input [
{
name: "inputs"
data_type: TYPE_STRING
dims: [-1]
}
]
output [
{
name: "outputs"
data_type: TYPE_FP32
dims: [-1, 512]
}
]
Copy the model to a GCS bucket:
gsutil cp -r model_repository/muse/ <BUCKET_NAME>
Run Triton:
docker run -t -p 8000:8000 --rm \
-v $(pwd):/workspace \
--name=tritonserver \
-e AIP_MODE=True \
-e LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so \
-e GOOGLE_APPLICATION_CREDENTIALS=/workspace/gcs_creds.json \
<IMAGE_NAME> \
--model-repository gs://my_bucket/model_repository \
--strict-model-config=false \
--log-verbose=1 \
--backend-config=tensorflow,version=2
In my case, I need my google credentials in the current working directory at gcs_creds.json
.
Expected behavior Should be able to load the universal sentence encoder from a GCS bucket and correctly serve requests
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
@lross68 Glad you’re unblocked! I’m closing this issue for now. We will update once we found a better and permanent solution for this.
Thanks @krishung5, this workaround does appear to do the trick - I am able to load the universal sentence encoder model from a GCS bucket and run inference successfully. I look forward to hearing what more permanent solutions are available, but this workaround should hold us over!