Triton requiring config.pbtxt when loading models from s3 (MinIO)?
See original GitHub issueUsing MinIO for S3, it seems starting with release 21.04, Triton is no longer automatically generating config files for models that lack them and instead errors out.
Logs from attempting to run version 21.06.1:
============================= == Triton Inference Server == ============================= NVIDIA Release 21.06 (build 24449615) Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use Docker with NVIDIA Container Toolkit to start this container; see https://github.com/NVIDIA/nvidia-docker. I0720 15:30:54.928850 1 libtorch.cc:987] TRITONBACKEND_Initialize: pytorch I0720 15:30:54.928886 1 libtorch.cc:997] Triton TRITONBACKEND API version: 1.4 I0720 15:30:54.928892 1 libtorch.cc:1003] 'pytorch' TRITONBACKEND API version: 1.4 2021-07-20 15:30:58.469303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I0720 15:30:58.518754 1 tensorflow.cc:2165] TRITONBACKEND_Initialize: tensorflow I0720 15:30:58.518777 1 tensorflow.cc:2175] Triton TRITONBACKEND API version: 1.4 I0720 15:30:58.518785 1 tensorflow.cc:2181] 'tensorflow' TRITONBACKEND API version: 1.4 I0720 15:30:58.518809 1 tensorflow.cc:2205] backend configuration: {} I0720 15:30:58.645847 1 onnxruntime.cc:1969] TRITONBACKEND_Initialize: onnxruntime I0720 15:30:58.645922 1 onnxruntime.cc:1979] Triton TRITONBACKEND API version: 1.4 I0720 15:30:58.645949 1 onnxruntime.cc:1985] 'onnxruntime' TRITONBACKEND API version: 1.4 I0720 15:30:58.741700 1 openvino.cc:1188] TRITONBACKEND_Initialize: openvino I0720 15:30:58.741725 1 openvino.cc:1198] Triton TRITONBACKEND API version: 1.4 I0720 15:30:58.741730 1 openvino.cc:1204] 'openvino' TRITONBACKEND API version: 1.4 W0720 15:30:58.743358 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I0720 15:30:58.743396 1 cuda_memory_manager.cc:115] CUDA memory pool disabled E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2 I0720 15:30:59.186653 1 model_repository_manager.cc:1045] loading: tensorflow_test:1 I0720 15:30:59.352603 1 model_repository_manager.cc:1045] loading: simple_identity:1 I0720 15:30:59.516047 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: simple_identity (version 1) 2021-07-20 15:30:59.516998: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel 2021-07-20 15:30:59.518218: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2021-07-20 15:30:59.520020: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2021-07-20 15:30:59.520047: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303) 2021-07-20 15:30:59.520075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist 2021-07-20 15:30:59.548411: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz 2021-07-20 15:30:59.548932: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f701400ddb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-07-20 15:30:59.548958: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-07-20 15:30:59.549823: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle. 2021-07-20 15:30:59.549876: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index 2021-07-20 15:30:59.549914: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 32918 microseconds. I0720 15:30:59.550395 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: simple_identity (CPU device 0) 2021-07-20 15:30:59.550430: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderNwTlej/1/model.savedmodel 2021-07-20 15:30:59.551550: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2021-07-20 15:30:59.551884: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle. 2021-07-20 15:30:59.551916: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderNwTlej/1/model.savedmodel/variables/variables.index 2021-07-20 15:30:59.551932: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 1504 microseconds. I0720 15:30:59.552115 1 model_repository_manager.cc:1212] successfully loaded 'simple_identity' version 1 I0720 15:30:59.654538 1 tensorflow.cc:2265] TRITONBACKEND_ModelInitialize: tensorflow_test (version 1) 2021-07-20 15:30:59.655047: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel 2021-07-20 15:30:59.722012: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2021-07-20 15:30:59.889045: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle. 2021-07-20 15:30:59.889175: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index 2021-07-20 15:30:59.889236: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 234193 microseconds. I0720 15:30:59.938570 1 tensorflow.cc:2314] TRITONBACKEND_ModelInstanceInitialize: tensorflow_test (CPU device 0) 2021-07-20 15:30:59.938616: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/folderObrYag/1/model.savedmodel 2021-07-20 15:30:59.979315: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2021-07-20 15:31:00.142792: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle. 2021-07-20 15:31:00.142861: I tensorflow/cc/saved_model/loader.cc:261] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /tmp/folderObrYag/1/model.savedmodel/variables/variables.index 2021-07-20 15:31:00.142898: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 204285 microseconds. I0720 15:31:00.143226 1 model_repository_manager.cc:1212] successfully loaded 'tensorflow_test' version 1 I0720 15:31:00.143350 1 server.cc:504] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0720 15:31:00.143507 1 server.cc:543] +-------------+-----------------------------------------------------------------+--------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+--------+ | tensorrt | <built-in> | {} | | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} | | tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} | | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} | | openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} | +-------------+-----------------------------------------------------------------+--------+ I0720 15:31:00.143550 1 server.cc:586] +-----------------+---------+--------+ | Model | Version | Status | +-----------------+---------+--------+ | simple_identity | 1 | READY | | tensorflow_test | 1 | READY | +-----------------+---------+--------+ I0720 15:31:00.143664 1 tritonserver.cc:1718] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.11.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics | | model_repository_path[0] | s3://172.17.0.2:9000/bucket-1/model_repository | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | pinned_memory_pool_byte_size | 268435456 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0720 15:31:00.143674 1 server.cc:234] Waiting for in-flight requests to complete. I0720 15:31:00.143679 1 model_repository_manager.cc:1078] unloading: tensorflow_test:1 I0720 15:31:00.143712 1 model_repository_manager.cc:1078] unloading: simple_identity:1 I0720 15:31:00.143757 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0720 15:31:00.143787 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state I0720 15:31:00.143790 1 server.cc:249] Timeout 30: Found 2 live models and 0 in-flight non-inference requests I0720 15:31:00.143914 1 tensorflow.cc:2352] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0720 15:31:00.143933 1 tensorflow.cc:2291] TRITONBACKEND_ModelFinalize: delete model state I0720 15:31:00.144261 1 model_repository_manager.cc:1195] successfully unloaded 'simple_identity' version 1 I0720 15:31:00.199582 1 model_repository_manager.cc:1195] successfully unloaded 'tensorflow_test' version 1 I0720 15:31:01.143922 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests error: creating server: Internal - failed to load all models
This error naturally stands out to me the most:
E0720 15:30:58.895833 1 model_repository_manager.cc:1919] Poll failed for model directory 'densenet_onnx': Could not get MetaData for object at s3://172.17.0.2:9000/bucket-1/model_repository/densenet_onnx/config.pbtxt due to exception: , error message: No response body. with address : 172.17.0.2
Also worth noting that when loading the models from disk it can generate config without issue.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Tensorflow MNIST model and Triton (e2e example)
Setup MinIO ¶. Use the provided notebook to install Minio in your cluster. Instructions also online. We will assume that MinIO service is...
Read more >Serve multiple models with Amazon SageMaker and Triton ...
The Triton server loads multiple models and exposes ports 8000, 8001, and 8002 as gRPC, HTTP, and metrics server. The Flask server listens...
Read more >Deploy Nvidia Triton Inference Server with MinIO as Model Store
This tutorial shows how to set up the Nvidia Triton Inference Server that treats the MinIO tenant as a model store.
Read more >Is there a way to get the config.pbtxt file from triton inferencing ...
This would enable to create its own config file while loading the model from the model repository. sudo docker run --rm --net=host -p...
Read more >MinIO and Apache Arrow Using R
MinIO is high-performance software-defined S3 compatible object storage, ... It's the code to set the environment and load the required ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, I specified
--strict-model-config=false
both times. Thanks for looking into it!Thanks for updating us! Reopening the issue. We will investigate.