CPU-only mode unable to load Models got CUDA error
See original GitHub issueProblem Description I was trying to follow the official example starting the server on a cpu-only device by calling the command:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/Users/tamannaverma/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models
Here is the logs:
> =============================
> == Triton Inference Server ==
> =============================
>
> NVIDIA Release 22.01 (build 31237564)
>
> Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>
> Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
>
> This container image and its contents are governed by the NVIDIA Deep Learning Container License.
> By pulling and using the container, you accept the terms and conditions of this license:
> https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib'.
> find: File system loop detected; '/usr/local/cuda/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib'.
>
> WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
> Use Docker with NVIDIA Container Toolkit to start this container; see
> https://github.com/NVIDIA/nvidia-docker.
>
> WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 999
> I0224 09:20:10.194531 1 libtorch.cc:1227] TRITONBACKEND_Initialize: pytorch
> I0224 09:20:10.194635 1 libtorch.cc:1237] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.194639 1 libtorch.cc:1243] 'pytorch' TRITONBACKEND API version: 1.7
> 2022-02-24 09:20:10.482327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 2022-02-24 09:20:10.533967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> I0224 09:20:10.534722 1 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
> I0224 09:20:10.534746 1 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.534749 1 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.7
> I0224 09:20:10.534752 1 tensorflow.cc:2216] backend configuration:
> {}
> I0224 09:20:10.546856 1 onnxruntime.cc:2232] TRITONBACKEND_Initialize: onnxruntime
> I0224 09:20:10.546921 1 onnxruntime.cc:2242] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.546924 1 onnxruntime.cc:2248] 'onnxruntime' TRITONBACKEND API version: 1.7
> I0224 09:20:10.546927 1 onnxruntime.cc:2278] backend configuration:
> {}
> W0224 09:20:10.563170 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: unknown error
> E0224 09:20:10.563244 1 server.cc:198] Failed to initialize CUDA memory manager: unable to get number of CUDA devices: unknown error
> W0224 09:20:10.563249 1 server.cc:205] failed to enable peer access for some device pairs
> E0224 09:20:10.584340 1 model_repository_manager.cc:1844] Poll failed for model directory 'densenet_onnx': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.596656 1 model_repository_manager.cc:1844] Poll failed for model directory 'inception_graphdef': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.607955 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.619405 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_dyna_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.632553 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_identity': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.640729 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_int8': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.649843 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.661630 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_string': unable to get number of CUDA devices: unknown error
> I0224 09:20:10.661776 1 server.cc:519]
> +------------------+------+
> | Repository Agent | Path |
> +------------------+------+
> +------------------+------+
>
> I0224 09:20:10.661800 1 server.cc:546]
> +-------------+-----------------------------------------------------------------+--------+
> | Backend | Path | Config |
> +-------------+-----------------------------------------------------------------+--------+
> | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
> | tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
> | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
> +-------------+-----------------------------------------------------------------+--------+
>
> I0224 09:20:10.661807 1 server.cc:589]
> +-------+---------+--------+
> | Model | Version | Status |
> +-------+---------+--------+
> +-------+---------+--------+
>
> I0224 09:20:10.661952 1 tritonserver.cc:1865]
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Option | Value |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | server_id | triton |
> | server_version | 2.18.0 |
> | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
> | model_repository_path[0] | /models |
> | model_control_mode | MODE_NONE |
> | strict_model_config | 1 |
> | rate_limit | OFF |
> | pinned_memory_pool_byte_size | 268435456 |
> | response_cache_byte_size | 0 |
> | min_supported_compute_capability | 6.0 |
> | strict_readiness | 1 |
> | exit_timeout | 30 |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> I0224 09:20:10.662202 1 server.cc:249] Waiting for in-flight requests to complete.
> I0224 09:20:10.662208 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
> error: creating server: Internal - failed to load all models
Triton Information Version: 22.01 I am using Mac M1 pro for the local setup.
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (5 by maintainers)
Top Results From Across the Web
Cuda driver errors on the machine without GPU while loading ...
1 - code works here too. Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model: tensorflow/stream_executor/cuda/cuda_driver ...
Read more >Installation — MMDetection 2.2.1 documentation
Install with CPU only¶. The code can be built for CPU only environment (where CUDA isn't available). In CPU mode you can run...
Read more >Getting started with PyTorch - IBM
The CPU-only variant is built without CUDA and GPU support. It has a smaller installation size, and omits features that would require a...
Read more >Install TensorFlow with pip
For the CPU-only build use the pip package named tensorflow-cpu . ... latest supported CUDA® architecture; therefore, TensorFlow fails to load on older...
Read more >Frequently Asked Questions — PyTorch 1.13 documentation
My model reports “cuda runtime error(2): out of memory” ... You will get the best memory usage if you don't hold onto temporaries...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes sorry for the delay. Here you have the release I’ve built https://hub.docker.com/repository/docker/prometeiads/tritonserver I’ll update it with the latest version
Hi @jbkyang-nvi I’ll share all the repository so that you can take a look. Anyway I’ve solved by building a docker image using the compose.py on my mac with m1 and specifying the in the docker build command
--platform=linux/amd64