Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error - "Internal - failed to load all models"

See original GitHub issue

I stayed up for two days, but it unresolved Of course, I also saw related issues.

computer environment

os : ubuntu 20.04
cuda : 11.1
cudnn : 8.0.5
tensorflow : 2.4.0
triton : 20.12-py3
gpu : RTX3090
python : 3.8.5

model directory tree

The model is an efficientdet-d4 based transfer learning model.

model config.pbtxt

just two line

command docker run --rm --gpus=1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/sub/models:/models nvcr.io/nvidia/tritonserver:20.12-py3 tritonserver --model-repository=/models --strict-model-config=false

I get the same error whether I enter the --gpus option or not

first, I thought it was a version compatibility issue, I tried many times in many cases. second, I thought it was a problem with the rtx30xx, so I checked the cuda and nvidia-driver versions carefully.

But not resolved and leaves an issue to get help.

This is my first time leaving an issue, so let me know right away if the way is wrong

LOGS

docker run --rm --gpus=1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/sub/models:/models nvcr.io/nvidia/tritonserver:20.12-py3 tritonserver --model-repository=/models --strict-model-config=false

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.12 (build 18156940)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0225 10:47:54.982586 1 metrics.cc:221] Collecting metrics for GPU 0: GeForce RTX 3090
I0225 10:47:55.235226 1 libtorch.cc:945] TRITONBACKEND_Initialize: pytorch
I0225 10:47:55.235264 1 libtorch.cc:955] Triton TRITONBACKEND API version: 1.0
I0225 10:47:55.235273 1 libtorch.cc:961] 'pytorch' TRITONBACKEND API version: 1.0
2021-02-25 10:47:55.418818: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0225 10:47:55.458605 1 tensorflow.cc:1877] TRITONBACKEND_Initialize: tensorflow
I0225 10:47:55.458658 1 tensorflow.cc:1887] Triton TRITONBACKEND API version: 1.0
I0225 10:47:55.458671 1 tensorflow.cc:1893] 'tensorflow' TRITONBACKEND API version: 1.0
I0225 10:47:55.458682 1 tensorflow.cc:1917] backend configuration:
{}
I0225 10:47:55.462070 1 onnxruntime.cc:1715] TRITONBACKEND_Initialize: onnxruntime
I0225 10:47:55.462106 1 onnxruntime.cc:1725] Triton TRITONBACKEND API version: 1.0
I0225 10:47:55.462119 1 onnxruntime.cc:1731] 'onnxruntime' TRITONBACKEND API version: 1.0
I0225 10:47:55.606966 1 pinned_memory_manager.cc:199] Pinned memory pool is created at '0x7fab9e000000' with size 268435456
I0225 10:47:55.607336 1 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864
I0225 10:47:55.608844 1 model_repository_manager.cc:787] loading: detection:1
I0225 10:47:55.709285 1 tensorflow.cc:1977] TRITONBACKEND_ModelInitialize: detection (version 1)
2021-02-25 10:47:55.711208: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/detection/1/model.savedmodel
2021-02-25 10:47:55.767476: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-02-25 10:47:55.845168: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-25 10:47:55.846284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755
pciBusID: 0000:1a:00.0
2021-02-25 10:47:55.846302: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-25 10:47:55.846321: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-25 10:47:55.846344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-25 10:47:55.846358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-25 10:47:55.846371: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-02-25 10:47:55.846383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-25 10:47:55.846395: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-25 10:47:55.848297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-02-25 10:47:58.378540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-25 10:47:58.378570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-02-25 10:47:58.378573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-02-25 10:47:58.381536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21728 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:1a:00.0, compute capability: 8.6)
2021-02-25 10:47:58.396908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dd97a710d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-02-25 10:47:58.396931: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3090, Compute Capability 8.6
2021-02-25 10:47:58.417114: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3299990000 Hz
2021-02-25 10:47:58.418693: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa9d27ecfa0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-25 10:47:58.418721: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-02-25 10:47:58.652328: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: fail. Took 2941125 microseconds.
I0225 10:47:58.699850 1 tensorflow.cc:2003] TRITONBACKEND_ModelFinalize: delete model state
E0225 10:47:58.699886 1 model_repository_manager.cc:963] failed to load 'detection' version 1: Internal: unable to auto-complete model configuration for 'detection', failed to load model: Op type not registered 'DecodeImage' in binary running on ebee40abd21b. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
I0225 10:47:58.700176 1 server.cc:490] 
+-------------+-----------------------------------------------------------------+------+
| Backend     | Config                                                          | Path |
+-------------+-----------------------------------------------------------------+------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}   |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}   |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}   |
+-------------+-----------------------------------------------------------------+------+

I0225 10:47:58.700246 1 server.cc:533] 
+-----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model     | Version | Status                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+-----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| detection | 1       | UNAVAILABLE: Internal: unable to auto-complete model configuration for 'detection', failed to load model: Op type not registered 'DecodeImage' in binary running on ebee40abd21b. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is |
|           |         |  first accessed.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+-----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0225 10:47:58.700324 1 tritonserver.cc:1620] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                              |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                             |
| server_version                   | 2.6.0                                                                                                                                              |
| server_extensions                | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /models                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                          |
| strict_model_config              | 0                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                           |
| min_supported_compute_capability | 6.0                                                                                                                                                |
| strict_readiness                 | 1                                                                                                                                                  |
| exit_timeout                     | 30                                                                                                                                                 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+

I0225 10:47:58.700330 1 server.cc:215] Waiting for in-flight requests to complete.
I0225 10:47:58.700340 1 server.cc:230] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models