pinned_memory_manager Killed
See original GitHub issueDescription
I want to deploy Triton server via Azure Kubernetes Service.
Target node is ND96asr v4
which is equipped with 8 A100 GPUs.
Triton server without loading any models cannot startup successfully.
Triton Information
- triton:
nvcr.io/nvidia/tritonserver:21.07-py3
- azure:
ND96asr v4
To Reproduce
- prepare cluster To create cluster you follow the procedure of the azure gpu-cluster article https://docs.microsoft.com/ja-jp/azure/aks/gpu-cluster.
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name gpunp \
--node-count 1 \
--node-vm-size Standard_NC6 \
--node-taints sku=gpu:NoSchedule \
--aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true
- deploy via deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-triton-ft
namespace: modules-gpt3-6b
spec:
replicas: 1
selector:
matchLabels:
app: sample
template:
metadata:
labels:
app: sample
spec:
containers:
- name: sample
image: nvcr.io/nvidia/tritonserver:21.07-py3
command: ["/bin/sh"]
args: ["-c", "while true; do sleep 10;done"]
tolerations:
- key: "sku"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
-
login the pod and run
mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/
-
confirm outputs
root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/
I0404 12:44:49.449929 92 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450370 92 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450406 92 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450431 92 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450454 92 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450483 92 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450504 92 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450531 92 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:44:50.485665 92 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:44:50.485729 92 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:44:50.485738 92 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:44:51.056099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:44:51.247146 92 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:44:51.247200 92 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.247209 92 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:44:51.247216 92 tensorflow.cc:2209] backend configuration:
{}
I0404 12:44:51.249647 92 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:44:51.249678 92 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.249687 92 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:44:51.343681 92 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:44:51.343707 92 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.343715 92 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node sample2-7cb48985d9-lgzfc exited on signal 9 (Killed).
--------------------------------------------------------------------------
When startup without mpirun, Killed
is observed.
root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# tritonserver --model-repository=/a
I0404 12:57:33.566547 197 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566814 197 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566832 197 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566844 197 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566856 197 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566870 197 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566880 197 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566893 197 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:57:34.057968 197 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:57:34.058020 197 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.058025 197 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:57:34.267157: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:57:34.351845 197 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:57:34.351893 197 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.351908 197 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:57:34.351912 197 tensorflow.cc:2209] backend configuration:
{}
I0404 12:57:34.353170 197 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:57:34.353190 197 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.353200 197 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:57:34.376199 197 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:57:34.376221 197 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.376225 197 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
Killed
Expected behavior startup successfully. The following output is node with 1 gpu.
root@gpt1b:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/a
I0404 11:55:52.082112 69 metrics.cc:290] Collecting metrics for GPU 0: Tesla V100-PCIE-16GB
I0404 11:55:52.375557 69 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 11:55:52.375599 69 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.375605 69 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 11:55:52.524003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 11:55:52.570841 69 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 11:55:52.570874 69 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.570880 69 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 11:55:52.570884 69 tensorflow.cc:2209] backend configuration:
{}
I0404 11:55:52.573942 69 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 11:55:52.573973 69 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.573979 69 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 11:55:52.595485 69 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 11:55:52.595508 69 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.595513 69 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
I0404 11:55:53.062644 69 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f945c000000' with size 268435456
I0404 11:55:53.063056 69 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0404 11:55:53.063869 69 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0404 11:55:53.063923 69 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt | <built-in> | {} |
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0404 11:55:53.063941 69 server.cc:586]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+
I0404 11:55:53.064038 69 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.12.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /a |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0404 11:55:53.065759 69 grpc_server.cc:4072] Started GRPCInferenceService at 0.0.0.0:8001
I0404 11:55:53.065984 69 http_server.cc:2795] Started HTTPService at 0.0.0.0:8000
I0404 11:55:53.107932 69 sagemaker_server.cc:134] Started Sagemaker HTTPService at 0.0.0.0:8080
I0404 11:55:53.160626 69 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002
Issue Analytics
- State:
- Created a year ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
CUDA for Tegra - NVIDIA Documentation Center
Pinned memory or unified memory can be used to reduce the data transfer overhead between CPU and iGPU as both memories are directly...
Read more >DRM Memory Management — The Linux Kernel documentation
This function returns a scatter/gather table suitable for driver usage. If the sg table doesn't exist, the pages are pinned, dma-mapped, and a...
Read more >memory-faq.txt
Dirty anonymous pages can be written to swap space, but in the absence of swap they remain "pinned" in physical memory. Anonymous mappings...
Read more >Memory Management — Ray 0.8.4 documentation
Ray implements distributed reference counting so that any ObjectID in scope in the cluster is pinned in the object store. This includes local...
Read more >What everyone should know about Kubernetes memory limits ...
On Kubernetes, the best practice is to set memory limit=request. ... know about Kubernetes memory limits, OOMKilled pods, and pizza parties.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Marking it as a bug and will investigate more into why triton is going OOM.
I see. One more experiment. Can you try this command?
Do you still see the failure? Read more about --cuda-memory-pool-byte-size from here: https://github.com/triton-inference-server/server/blob/main/src/main.cc#L555
64 MB should not be a great deal for 40GB gpus and 900GB machine. Most likely it is an issue with your environment. Trying to narrow down the same with these experiments.