Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton requiring config.pbtxt when loading models from s3 (MinIO)?

See original GitHub issue

Using MinIO for S3, it seems starting with release 21.04, Triton is no longer automatically generating config files for models that lack them and instead errors out.

Logs from attempting to run version 21.06.1:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.06 (build 24449615)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use Docker with NVIDIA Container Toolkit to start this container; see
   https://github.com/NVIDIA/nvidia-docker.

I0720 15:30:54.928850 1 libtorch.cc:987] TRITONBACKEND_Initialize: pytorch
I0720 15:30:54.928886 1 libtorch.cc:997] Triton TRITONBACKEND API version: 1.4
I0720 15:30:54.928892 1 libtorch.cc:1003] 'pytorch' TRITONBACKEND API version: 1.4
2021-07-20 15:30:58.469303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0720 15:30:58.518754 1 tensorflow.cc:2165] TRITONBACKEND_Initialize: tensorflow
I0720 15:30:58.518777 1 tensorflow.cc:2175] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.518785 1 tensorflow.cc:2181] 'tensorflow' TRITONBACKEND API version: 1.4
I0720 15:30:58.518809 1 tensorflow.cc:2205] backend configuration:
{}
I0720 15:30:58.645847 1 onnxruntime.cc:1969] TRITONBACKEND_Initialize: onnxruntime
I0720 15:30:58.645922 1 onnxruntime.cc:1979] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.645949 1 onnxruntime.cc:1985] 'onnxruntime' TRITONBACKEND API version: 1.4
I0720 15:30:58.741700 1 openvino.cc:1188] TRITONBACKEND_Initialize: openvino
I0720 15:30:58.741725 1 openvino.cc:1198] Triton TRITONBACKEND API version: 1.4
I0720 15:30:58.741730 1 openvino.cc:1204] 'openvino' TRITONBACKEND API version: 1.4
W0720 15:30:58.743358 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0720 15:30:58.743396 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2
I0720 15:30:59.186653 1 model_repository_manager.cc:1045] loading: tensorflow_test:1
I0720 15:30:59.352603 1 model_repository_manager.cc:1045] loading: simple_identity:1
I0720 15:30:59.516047 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: simple_identity (version 1)
2021-07-20 15:30:59.516998: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel
2021-07-20 15:30:59.518218: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.520020: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-20 15:30:59.520047: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-20 15:30:59.520075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2021-07-20 15:30:59.548411: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2021-07-20 15:30:59.548932: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f701400ddb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-20 15:30:59.548958: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-20 15:30:59.549823: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.549876: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.549914: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 32918 microseconds.
I0720 15:30:59.550395 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: simple_identity (CPU device 0)
2021-07-20 15:30:59.550430: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel
2021-07-20 15:30:59.551550: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.551884: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.551916: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.551932: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 1504 microseconds.
I0720 15:30:59.552115 1 model_repository_manager.cc:1212] successfully loaded 'simple_identity' version 1
I0720 15:30:59.654538 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: tensorflow_test (version 1)
2021-07-20 15:30:59.655047: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel
2021-07-20 15:30:59.722012: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:30:59.889045: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:30:59.889175: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index
2021-07-20 15:30:59.889236: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 234193 microseconds.
I0720 15:30:59.938570 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: tensorflow_test (CPU device 0)
2021-07-20 15:30:59.938616: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel
2021-07-20 15:30:59.979315: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-07-20 15:31:00.142792: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-07-20 15:31:00.142861: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index
2021-07-20 15:31:00.142898: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 204285 microseconds.
I0720 15:31:00.143226 1 model_repository_manager.cc:1212] successfully loaded 'tensorflow_test' version 1
I0720 15:31:00.143350 1 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0720 15:31:00.143507 1 server.cc:543] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0720 15:31:00.143550 1 server.cc:586] 
+-----------------+---------+--------+
| Model           | Version | Status |
+-----------------+---------+--------+
| simple_identity | 1       | READY  |
| tensorflow_test | 1       | READY  |
+-----------------+---------+--------+

I0720 15:31:00.143664 1 tritonserver.cc:1718] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.11.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | s3://172.17.0.2:9000/bucket-1/model_repository                                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 0                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0720 15:31:00.143674 1 server.cc:234] Waiting for in-flight requests to complete.
I0720 15:31:00.143679 1 model_repository_manager.cc:1078] unloading: tensorflow_test:1
I0720 15:31:00.143712 1 model_repository_manager.cc:1078] unloading: simple_identity:1
I0720 15:31:00.143757 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0720 15:31:00.143787 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state
I0720 15:31:00.143790 1 server.cc:249] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0720 15:31:00.143914 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0720 15:31:00.143933 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state
I0720 15:31:00.144261 1 model_repository_manager.cc:1195] successfully unloaded 'simple_identity' version 1
I0720 15:31:00.199582 1 model_repository_manager.cc:1195] successfully unloaded 'tensorflow_test' version 1
I0720 15:31:01.143922 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

This error naturally stands out to me the most: E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2

Also worth noting that when loading the models from disk it can generate config without issue.