question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton requiring config.pbtxt when loading models from s3 (MinIO)?

See original GitHub issue

Using MinIO for S3, it seems starting with release 21.04, Triton is no longer automatically generating config files for models that lack them and instead errors out.

Logs from attempting to run version 21.06.1:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.06 (build 24449615)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use Docker with NVIDIA Container Toolkit to start this container; see
   https://github.com/NVIDIA/nvidia-docker.

I0720 15:30:54.928850 1 libtorch.cc:987] TRITONBACKEND_Initialize: pytorch
I0720 15:30:54.928886 1 libtorch.cc:997] Triton TRITONBACKEND API version: 1.4
I0720 15:30:54.928892 1 libtorch.cc:1003] 'pytorch' TRITONBACKEND API version: 1.4
2021-07-20 15:30:58.469303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0720 15:30:58.518754 1 tensorflow.cc:2165] TRITONBACKEND_Initialize: tensorflow
I0720 15:30:58.518777 1 tensorflow.cc:2175] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.518785 1 tensorflow.cc:2181] 'tensorflow' TRITONBACKEND API version: 1.4
I0720 15:30:58.518809 1 tensorflow.cc:2205] backend configuration:
{}
I0720 15:30:58.645847 1 onnxruntime.cc:1969] TRITONBACKEND_Initialize: onnxruntime
I0720 15:30:58.645922 1 onnxruntime.cc:1979] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.645949 1 onnxruntime.cc:1985] 'onnxruntime' TRITONBACKEND API version: 1.4
I0720 15:30:58.741700 1 openvino.cc:1188] TRITONBACKEND_Initialize: openvino
I0720 15:30:58.741725 1 openvino.cc:1198] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.741730 1 openvino.cc:1204] 'openvino' TRITONBACKEND API version: 1.4
W0720 15:30:58.743358 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0720 15:30:58.743396 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2
I0720 15:30:59.186653 1 model_repository_manager.cc:1045] loading: tensorflow_test:1
I0720 15:30:59.352603 1 model_repository_manager.cc:1045] loading: simple_identity:1
I0720 15:30:59.516047 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: simple_identity (version 1)
2021-07-20 15:30:59.516998: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel
2021-07-20 15:30:59.518218: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.520020: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-20 15:30:59.520047: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-20 15:30:59.520075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2021-07-20 15:30:59.548411: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2021-07-20 15:30:59.548932: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f701400ddb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-20 15:30:59.548958: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-20 15:30:59.549823: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.549876: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.549914: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 32918 microseconds.
I0720 15:30:59.550395 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: simple_identity (CPU device 0)
2021-07-20 15:30:59.550430: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel
2021-07-20 15:30:59.551550: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.551884: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.551916: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.551932: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 1504 microseconds.
I0720 15:30:59.552115 1 model_repository_manager.cc:1212] successfully loaded 'simple_identity' version 1
I0720 15:30:59.654538 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: tensorflow_test (version 1)
2021-07-20 15:30:59.655047: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel
2021-07-20 15:30:59.722012: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.889045: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.889175: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.889236: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 234193 microseconds.
I0720 15:30:59.938570 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: tensorflow_test (CPU device 0)
2021-07-20 15:30:59.938616: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel
2021-07-20 15:30:59.979315: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:31:00.142792: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:31:00.142861: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index
2021-07-20 15:31:00.142898: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 204285 microseconds.
I0720 15:31:00.143226 1 model_repository_manager.cc:1212] successfully loaded 'tensorflow_test' version 1
I0720 15:31:00.143350 1 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0720 15:31:00.143507 1 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0720 15:31:00.143550 1 server.cc:586] 
+-----------------+---------+--------+
| Model           | Version | Status |
+-----------------+---------+--------+
| simple_identity | 1       | READY  |
| tensorflow_test | 1       | READY  |
+-----------------+---------+--------+

I0720 15:31:00.143664 1 tritonserver.cc:1718] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.11.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | s3://172.17.0.2:9000/bucket-1/model_repository                                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 0                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0720 15:31:00.143674 1 server.cc:234] Waiting for in-flight requests to complete.
I0720 15:31:00.143679 1 model_repository_manager.cc:1078] unloading: tensorflow_test:1
I0720 15:31:00.143712 1 model_repository_manager.cc:1078] unloading: simple_identity:1
I0720 15:31:00.143757 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0720 15:31:00.143787 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state
I0720 15:31:00.143790 1 server.cc:249] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0720 15:31:00.143914 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0720 15:31:00.143933 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state
I0720 15:31:00.144261 1 model_repository_manager.cc:1195] successfully unloaded 'simple_identity' version 1
I0720 15:31:00.199582 1 model_repository_manager.cc:1195] successfully unloaded 'tensorflow_test' version 1
I0720 15:31:01.143922 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

This error naturally stands out to me the most: E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2

Also worth noting that when loading the models from disk it can generate config without issue.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
LMarino1commented, Jul 20, 2021

Yes, I specified --strict-model-config=false both times. Thanks for looking into it!

0reactions
dyastremskycommented, Aug 30, 2021

Thanks for updating us! Reopening the issue. We will investigate.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow MNIST model and Triton (e2e example)
Setup MinIO ¶. Use the provided notebook to install Minio in your cluster. Instructions also online. We will assume that MinIO service is...
Read more >
Serve multiple models with Amazon SageMaker and Triton ...
The Triton server loads multiple models and exposes ports 8000, 8001, and 8002 as gRPC, HTTP, and metrics server. The Flask server listens...
Read more >
Deploy Nvidia Triton Inference Server with MinIO as Model Store
This tutorial shows how to set up the Nvidia Triton Inference Server that treats the MinIO tenant as a model store.
Read more >
Is there a way to get the config.pbtxt file from triton inferencing ...
This would enable to create its own config file while loading the model from the model repository. sudo docker run --rm --net=host -p...
Read more >
MinIO and Apache Arrow Using R
MinIO is high-performance software-defined S3 compatible object storage, ... It's the code to set the environment and load the required ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found