question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pinned_memory_manager Killed

See original GitHub issue

Description I want to deploy Triton server via Azure Kubernetes Service. Target node is ND96asr v4 which is equipped with 8 A100 GPUs. Triton server without loading any models cannot startup successfully.

Triton Information

  • triton: nvcr.io/nvidia/tritonserver:21.07-py3
  • azure: ND96asr v4

To Reproduce

  1. prepare cluster To create cluster you follow the procedure of the azure gpu-cluster article https://docs.microsoft.com/ja-jp/azure/aks/gpu-cluster.
az aks nodepool add \
   --resource-group myResourceGroup \
   --cluster-name myAKSCluster \
   --name gpunp \
   --node-count 1 \
   --node-vm-size Standard_NC6 \
   --node-taints sku=gpu:NoSchedule \
   --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true
  1. deploy via deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-triton-ft
  namespace: modules-gpt3-6b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: sample
        image: nvcr.io/nvidia/tritonserver:21.07-py3
        command: ["/bin/sh"]
        args: ["-c", "while true; do sleep 10;done"]
      tolerations:
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
  1. login the pod and run mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/

  2. confirm outputs

root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/
I0404 12:44:49.449929 92 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450370 92 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450406 92 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450431 92 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450454 92 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450483 92 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450504 92 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450531 92 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:44:50.485665 92 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:44:50.485729 92 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:44:50.485738 92 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:44:51.056099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:44:51.247146 92 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:44:51.247200 92 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.247209 92 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:44:51.247216 92 tensorflow.cc:2209] backend configuration:
{}
I0404 12:44:51.249647 92 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:44:51.249678 92 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.249687 92 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:44:51.343681 92 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:44:51.343707 92 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.343715 92 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node sample2-7cb48985d9-lgzfc exited on signal 9 (Killed).
--------------------------------------------------------------------------

When startup without mpirun, Killed is observed.

root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# tritonserver --model-repository=/a
I0404 12:57:33.566547 197 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566814 197 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566832 197 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566844 197 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566856 197 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566870 197 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566880 197 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566893 197 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:57:34.057968 197 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:57:34.058020 197 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.058025 197 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:57:34.267157: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:57:34.351845 197 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:57:34.351893 197 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.351908 197 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:57:34.351912 197 tensorflow.cc:2209] backend configuration:
{}
I0404 12:57:34.353170 197 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:57:34.353190 197 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.353200 197 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:57:34.376199 197 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:57:34.376221 197 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.376225 197 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
Killed

Expected behavior startup successfully. The following output is node with 1 gpu.

root@gpt1b:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/a
I0404 11:55:52.082112 69 metrics.cc:290] Collecting metrics for GPU 0: Tesla V100-PCIE-16GB
I0404 11:55:52.375557 69 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 11:55:52.375599 69 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.375605 69 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 11:55:52.524003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 11:55:52.570841 69 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 11:55:52.570874 69 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.570880 69 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 11:55:52.570884 69 tensorflow.cc:2209] backend configuration:
{}
I0404 11:55:52.573942 69 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 11:55:52.573973 69 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.573979 69 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 11:55:52.595485 69 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 11:55:52.595508 69 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.595513 69 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
I0404 11:55:53.062644 69 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f945c000000' with size 268435456
I0404 11:55:53.063056 69 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0404 11:55:53.063869 69 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0404 11:55:53.063923 69 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0404 11:55:53.063941 69 server.cc:586]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0404 11:55:53.064038 69 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.12.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /a                                                                                                                                                                                     |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0404 11:55:53.065759 69 grpc_server.cc:4072] Started GRPCInferenceService at 0.0.0.0:8001
I0404 11:55:53.065984 69 http_server.cc:2795] Started HTTPService at 0.0.0.0:8000
I0404 11:55:53.107932 69 sagemaker_server.cc:134] Started Sagemaker HTTPService at 0.0.0.0:8080
I0404 11:55:53.160626 69 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
tanmayv25commented, Apr 15, 2022

Marking it as a bug and will investigate more into why triton is going OOM.

1reaction
tanmayv25commented, Apr 14, 2022

I see. One more experiment. Can you try this command?

tritonserver --model-repository=/workspace --pinned-memory-pool-byte-size=0 --cuda-memory-pool-byte-size=0:0 --cuda-memory-pool-byte-size=1:0 --cuda-memory-pool-byte-size=2:0 --cuda-memory-pool-byte-size=3:0 --cuda-memory-pool-byte-size=4:0 --cuda-memory-pool-byte-size=5:0 --cuda-memory-pool-byte-size=6:0 --cuda-memory-pool-byte-size=7:0

Do you still see the failure? Read more about --cuda-memory-pool-byte-size from here: https://github.com/triton-inference-server/server/blob/main/src/main.cc#L555

64 MB should not be a great deal for 40GB gpus and 900GB machine. Most likely it is an issue with your environment. Trying to narrow down the same with these experiments.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA for Tegra - NVIDIA Documentation Center
Pinned memory or unified memory can be used to reduce the data transfer overhead between CPU and iGPU as both memories are directly...
Read more >
DRM Memory Management — The Linux Kernel documentation
This function returns a scatter/gather table suitable for driver usage. If the sg table doesn't exist, the pages are pinned, dma-mapped, and a...
Read more >
memory-faq.txt
Dirty anonymous pages can be written to swap space, but in the absence of swap they remain "pinned" in physical memory. Anonymous mappings...
Read more >
Memory Management — Ray 0.8.4 documentation
Ray implements distributed reference counting so that any ObjectID in scope in the cluster is pinned in the object store. This includes local...
Read more >
What everyone should know about Kubernetes memory limits ...
On Kubernetes, the best practice is to set memory limit=request. ... know about Kubernetes memory limits, OOMKilled pods, and pizza parties.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found