Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CPU-only mode unable to load Models got CUDA error

See original GitHub issue

Problem Description I was trying to follow the official example starting the server on a cpu-only device by calling the command:

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/Users/tamannaverma/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models

Here is the logs:

> =============================
> == Triton Inference Server ==
> =============================
> 
> NVIDIA Release 22.01 (build 31237564)
> 
> Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> 
> Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
> 
> This container image and its contents are governed by the NVIDIA Deep Learning Container License.
> By pulling and using the container, you accept the terms and conditions of this license:
> https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib'.
> find: File system loop detected; '/usr/local/cuda/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib'.
> 
> WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
>    Use Docker with NVIDIA Container Toolkit to start this container; see
>    https://github.com/NVIDIA/nvidia-docker.
> 
> WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 999
> I0224 09:20:10.194531 1 libtorch.cc:1227] TRITONBACKEND_Initialize: pytorch
> I0224 09:20:10.194635 1 libtorch.cc:1237] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.194639 1 libtorch.cc:1243] 'pytorch' TRITONBACKEND API version: 1.7
> 2022-02-24 09:20:10.482327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 2022-02-24 09:20:10.533967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> I0224 09:20:10.534722 1 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
> I0224 09:20:10.534746 1 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.534749 1 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.7
> I0224 09:20:10.534752 1 tensorflow.cc:2216] backend configuration:
> {}
> I0224 09:20:10.546856 1 onnxruntime.cc:2232] TRITONBACKEND_Initialize: onnxruntime
> I0224 09:20:10.546921 1 onnxruntime.cc:2242] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.546924 1 onnxruntime.cc:2248] 'onnxruntime' TRITONBACKEND API version: 1.7
> I0224 09:20:10.546927 1 onnxruntime.cc:2278] backend configuration:
> {}
> W0224 09:20:10.563170 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: unknown error
> E0224 09:20:10.563244 1 server.cc:198] Failed to initialize CUDA memory manager: unable to get number of CUDA devices: unknown error
> W0224 09:20:10.563249 1 server.cc:205] failed to enable peer access for some device pairs
> E0224 09:20:10.584340 1 model_repository_manager.cc:1844] Poll failed for model directory 'densenet_onnx': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.596656 1 model_repository_manager.cc:1844] Poll failed for model directory 'inception_graphdef': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.607955 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.619405 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_dyna_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.632553 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_identity': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.640729 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_int8': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.649843 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.661630 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_string': unable to get number of CUDA devices: unknown error
> I0224 09:20:10.661776 1 server.cc:519] 
> +------------------+------+
> | Repository Agent | Path |
> +------------------+------+
> +------------------+------+
> 
> I0224 09:20:10.661800 1 server.cc:546] 
> +-------------+-----------------------------------------------------------------+--------+
> | Backend     | Path                                                            | Config |
> +-------------+-----------------------------------------------------------------+--------+
> | pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
> | tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
> | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
> +-------------+-----------------------------------------------------------------+--------+
> 
> I0224 09:20:10.661807 1 server.cc:589] 
> +-------+---------+--------+
> | Model | Version | Status |
> +-------+---------+--------+
> +-------+---------+--------+
> 
> I0224 09:20:10.661952 1 tritonserver.cc:1865] 
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Option                           | Value                                                                                                                                                                                  |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | server_id                        | triton                                                                                                                                                                                 |
> | server_version                   | 2.18.0                                                                                                                                                                                 |
> | server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
> | model_repository_path[0]         | /models                                                                                                                                                                                |
> | model_control_mode               | MODE_NONE                                                                                                                                                                              |
> | strict_model_config              | 1                                                                                                                                                                                      |
> | rate_limit                       | OFF                                                                                                                                                                                    |
> | pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
> | response_cache_byte_size         | 0                                                                                                                                                                                      |
> | min_supported_compute_capability | 6.0                                                                                                                                                                                    |
> | strict_readiness                 | 1                                                                                                                                                                                      |
> | exit_timeout                     | 30                                                                                                                                                                                     |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 
> I0224 09:20:10.662202 1 server.cc:249] Waiting for in-flight requests to complete.
> I0224 09:20:10.662208 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
> error: creating server: Internal - failed to load all models

Triton Information Version: 22.01 I am using Mac M1 pro for the local setup.

Issue Analytics

State:
Created 2 years ago
Comments:16 (5 by maintainers)

Top GitHub Comments

1reaction

fabiofumarolacommented, May 6, 2022

Yes sorry for the delay. Here you have the release I’ve built https://hub.docker.com/repository/docker/prometeiads/tritonserver I’ll update it with the latest version

1reaction

fabiofumarolacommented, Mar 8, 2022

Hi @jbkyang-nvi I’ll share all the repository so that you can take a look. Anyway I’ve solved by building a docker image using the compose.py on my mac with m1 and specifying the in the docker build command --platform=linux/amd64

Top Results From Across the Web

Cuda driver errors on the machine without GPU while loading ...

1 - code works here too. Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model: tensorflow/stream_executor/cuda/cuda_driver ...

Installation — MMDetection 2.2.1 documentation

Install with CPU only¶. The code can be built for CPU only environment (where CUDA isn't available). In CPU mode you can run...

Getting started with PyTorch - IBM

The CPU-only variant is built without CUDA and GPU support. It has a smaller installation size, and omits features that would require a...

Install TensorFlow with pip

For the CPU-only build use the pip package named tensorflow-cpu . ... latest supported CUDA® architecture; therefore, TensorFlow fails to load on older...

Frequently Asked Questions — PyTorch 1.13 documentation

My model reports “cuda runtime error(2): out of memory” ... You will get the best memory usage if you don't hold onto temporaries...