Docker fails to register cuda shared memory
See original GitHub issueDescription
The Triton client is unable to register Cuda shared memory when running the script from the docker command. Although, it works when running the docker in interactive mode.
Triton Information What version of Triton are you using?
Server: nvcr.io/nvidia/tritonserver:21.03-py3 Client: nvcr.io/nvidia/tritonserver:21.03-py3-sdk
Are you using the Triton container or did you build it yourself? I am using the Triton container
Dockerfile.client
FROM nvcr.io/nvidia/tritonserver:21.03-py3-sdk
RUN apt update && apt install -y libb64-dev ffmpeg
docker-compose.yml
version: '2.3'
services:
triton-server:
container_name: triton-server
image: nvcr.io/nvidia/tritonserver:21.03-py3
privileged: true
runtime: nvidia
shm_size: '2gb'
ports:
- "8000:8000"
- "8001:8001"
- "8002:8002"
ipc: host
ulimits:
stack: 67108864
memlock: -1
environment:
- LD_PRELOAD=/plugins/liblayerplugin.so
- log-verbose=4
command: bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16"
triton-client:
container_name: triton-client
build:
context: .
network_mode: 'host'
working_dir: /app/src
depends_on:
- triton-server
environment:
- log-verbose=4
privileged: true
runtime: nvidia
shm_size: '2gb'
command: bash -c "python3 simple_grpc_cudashm_client.py --verbose"
To Reproduce
Build the client container and then, run docker-compose up
. The triton-client container will execute the script simple_grpc_cudashm_client.py but it will throw the following error:
unregister_system_shared_memory, metadata ()
triton-client |
triton-client | Unregistered all system shared memory regions
triton-client | unregister_cuda_shared_memory, metadata ()
triton-client |
triton-client | Unregistered all cuda shared memory regions
triton-client | register_cuda_shared_memory, metadata ()
triton-client | name: "output0_data"
triton-client | raw_handle: "\260iu\001\000\000\000\000\001\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0001\000\000\000\000\000\000\000\242\000\320\301\216\000\000\\\000\000\000\000"
triton-client | byte_size: 3276800
triton-client |
triton-client | Traceback (most recent call last):
triton-client | File "simple_grpc_cudashm_client.py", line 61, in <module>
triton-client | triton_client.register_cuda_shared_memory(
triton-client | File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 906, in register_cuda_shared_memory
triton-client | raise_error_grpc(rpc_error)
triton-client | File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
triton-client | raise get_error_grpc(rpc_error) from None
triton-client | tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] failed to register CUDA shared memory region 'output0_data'
The curious thing happens when I run the script from inside the container. If you run the container with docker-compose run triton-client bash
and the from the terminal inside the container you execute `python3 simple_grpc_cudashm_client.py --verbose, the client works as expected without errors. This is the output generated in this case:
unregister_system_shared_memory, metadata ()
Unregistered all system shared memory regions
unregister_cuda_shared_memory, metadata ()
Unregistered all cuda shared memory regions
register_cuda_shared_memory, metadata ()
name: "output0_data"
raw_handle: " 3\225\001\000\000\000\000\t\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0001\000\000\000\000\000\000\000\304\000\320\301\216\000\000\\\000\000\000\000"
byte_size: 3276800
Registered cuda shared memory with name 'output0_data'
register_cuda_shared_memory, metadata ()
name: "input0_data"
raw_handle: " 3\225\001\000\000\000\000\t\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0002\000\000\000\000\000\000\000\304\000\320\301\220\000\000\\\000\000\000\000"
byte_size: 3276800
Registered Cuda shared memory with name 'input0_data'
It’s important to notice that running with `docker-compose run triton-client python3 simple_grpc_cudashm_client.py --verbose also generates the same error.
Attachments
Script simple_grpc_cudashm_client.py
import os
import json
import sys
import argparse
import numpy as np
import tritonclient.grpc as grpcclient
from tritonclient import utils
import tritonclient.utils.cuda_shared_memory as cudashm
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-v',
'--verbose',
action="store_true",
required=False,
default=False,
help='Enable verbose output')
parser.add_argument('-u',
'--url',
type=str,
required=False,
default='localhost:8001',
help='Inference server URL. Default is localhost:8001.')
FLAGS = parser.parse_args()
try:
triton_client = grpcclient.InferenceServerClient(url=FLAGS.url,
verbose=FLAGS.verbose)
except Exception as e:
print("channel creation failed: " + str(e))
sys.exit(1)
triton_client.unregister_system_shared_memory()
triton_client.unregister_cuda_shared_memory()
model_name = "test"
model_version = "latest"
input_byte_size = 3276800 # 1600x512x4bytes
output_byte_size = input_byte_size
shm_op0_handle = cudashm.create_shared_memory_region(
"output0_data", output_byte_size, 0)
triton_client.register_cuda_shared_memory(
"output0_data", cudashm.get_raw_handle(shm_op0_handle), 0,
output_byte_size)
shm_ip0_handle = cudashm.create_shared_memory_region(
"input0_data", input_byte_size, 0)
triton_client.register_cuda_shared_memory(
"input0_data", cudashm.get_raw_handle(shm_ip0_handle), 0,
input_byte_size)
Does anyone have an idea why this happens when launching the triton-client with docker-compose up?
Thanks
Issue Analytics
- State:
- Created 2 years ago
- Comments:35 (14 by maintainers)
Top GitHub Comments
I was able to root cause this issue. It looks like the problem is that
cudaIpcOpenMemHandle
will return invalid context if the handle’s source PID matches the destination process PID. When using Docker, the process ID namespace of both of the container is enabled which will allow reuse of the process IDs. When the interactive mode is used, the PIDs will be different and there will be no issues.The fix would be to add
--pid host
flag to both of the containers. I was able to reproduce the issue locally and confirm that adding the flag fixes the issue.Hi @rmccorm4
I’ve been able to reproduce this issue using the ‘simple’ model, running the docker image
nvcr.io/nvidia/tritonserver:22.04-py3
. It happens when I run both the server and client in non-interactive mode.To reproduce, cd to
server/docs/examples/model_repository
(I ranfetch_models.sh
instead of deleting the rest)Start the triton-server
Start the triton-client test
This fails with the error message:
If I start either the server or the triton client in interactive mode, the test passes. For example, I could start the server like this:
I have verified this behavior using two machines I am working with.
If it helps, here’s my configuration: nvidia driver:
NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6
nvidia-docker2: 2.10.0-1 OS: Ubuntu20.04