Cannot get CUDA device count, GPU metrics will not be available on multi-gpus
See original GitHub issueDescription
I want to deploy Triton server via Azure Kubernetes Service.
My target node is ND96asr v4
which is equipped with 8 A100 GPU.
When running Triton server without loading any models, the following sentences are displayed.
root@fastertransformer-7dd47c77bb-46gpb:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/workspace
W0221 16:43:52.559411 1908 metrics.cc:274] Cannot get CUDA device count, GPU metrics will not be available
I0221 16:43:52.791832 1908 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0221 16:43:52.791877 1908 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
(※ /workspace is empty dir) Among them,
Cannot get CUDA device count, GPU metrics will not be available
is trouble with loading model. I assume that the problem is caused by docker image because at startup
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 21.07 (build 24810355)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: No supported GPU(s) detected to run this container
ERROR: No supported GPU(s) detected to run this container
is obtained. However, I can execute nvidia-smi
as
root@fastertransformer-749fc45c48-hdjhq:/workspace# nvidia-smi
Mon Feb 21 20:22:29 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000001:00:00.0 Off | 0 |
| N/A 41C P0 49W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... Off | 00000002:00:00.0 Off | 0 |
| N/A 40C P0 54W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... Off | 00000003:00:00.0 Off | 0 |
| N/A 40C P0 52W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... Off | 00000004:00:00.0 Off | 0 |
| N/A 41C P0 53W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM... Off | 0000000B:00:00.0 Off | 0 |
| N/A 41C P0 57W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM... Off | 0000000C:00:00.0 Off | 0 |
| N/A 39C P0 50W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM... Off | 0000000D:00:00.0 Off | 0 |
| N/A 40C P0 50W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM... Off | 0000000E:00:00.0 Off | 0 |
| N/A 41C P0 53W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
How to fix that? For comparison, I also try to deploy to machine equipped with single T4, and the startup succeeds.
root@fastertransformer-cc8dbdf6-vbp44:/workspace# nvidia-smi
Tue Feb 22 01:39:26 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000001:00:00.0 Off | Off |
| N/A 32C P8 9W / 70W | 0MiB / 16127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@fastertransformer-cc8dbdf6-vbp44:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/workspace
I0221 16:40:48.387855 61 metrics.cc:290] Collecting metrics for GPU 0: Tesla T4
I0221 16:40:48.615749 61 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
Therefore, I assume the settings for multi-gpus are wrong, but I do not what is wrong…
Triton Information
- docker image:
nvcr.io/nvidia/tritonserver:21.07-py3
- nvidia driver:
470.57.02
- CUDA:
11.4
- K8S:
1.22.4
- Node Image:
AKSUbuntu-1804gen2containerd-2022.02.01
- Node Size:
Standard_ND96asr_v4
To Reproduce
Run Triton server nvcr.io/nvidia/tritonserver:21.07-py3
on ND96asr v4
node via AKS.
Expected behavior Like the case of using single T4 machine, Triton server can collect metrics
root@fastertransformer-cc8dbdf6-vbp44:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/workspace
I0221 16:40:48.387855 61 metrics.cc:290] Collecting metrics for GPU 0: Tesla T4
I0221 16:40:48.615749 61 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0221 16:40:48.615782 61 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0221 16:40:48.615786 61 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
...
(※ /workspace is empty dir)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Please do not reopen issues for new questions. Unless the original question needs follow-up, we ask that you open a new issue.
For nvidia-smi, you’ll want to check out their documentation and resources. Triton works fine on devices where it cannot retrieve GPU metrics. And I see FasterTransformer’s performance example uses an A100, though you can also check with them. The best way to see both is to run an inference request.
I’m closing this ticket. If you have any follow-up to the new questions or any additional questions, please open a new issue for those.
@dyastremsky Thank you for your reply. I’m sorry for overlooking the known issues and will wait for the modification!