Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Docker fails to register cuda shared memory

See original GitHub issue

Description

The Triton client is unable to register Cuda shared memory when running the script from the docker command. Although, it works when running the docker in interactive mode.

Triton Information What version of Triton are you using?

Server: nvcr.io/nvidia/tritonserver:21.03-py3 Client: nvcr.io/nvidia/tritonserver:21.03-py3-sdk

Are you using the Triton container or did you build it yourself? I am using the Triton container

Dockerfile.client

FROM  nvcr.io/nvidia/tritonserver:21.03-py3-sdk

RUN apt update && apt install -y libb64-dev ffmpeg

docker-compose.yml

version: '2.3'

services:
  triton-server:
    container_name: triton-server
    image: nvcr.io/nvidia/tritonserver:21.03-py3
    privileged: true
    runtime: nvidia
    shm_size: '2gb'
    ports:
      - "8000:8000"
      - "8001:8001"
      - "8002:8002"
    ipc: host
    ulimits:
      stack: 67108864
      memlock: -1
    environment:
      - LD_PRELOAD=/plugins/liblayerplugin.so
      - log-verbose=4
    command: bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16"

  triton-client:
    container_name: triton-client
    build:
      context: .
    network_mode: 'host'
    working_dir: /app/src
    depends_on:
      - triton-server
    environment:
      - log-verbose=4
    privileged: true
    runtime: nvidia
    shm_size: '2gb'
    command: bash -c "python3 simple_grpc_cudashm_client.py --verbose"

To Reproduce

Build the client container and then, run docker-compose up. The triton-client container will execute the script simple_grpc_cudashm_client.py but it will throw the following error:

unregister_system_shared_memory, metadata ()
triton-client    | 
triton-client    | Unregistered all system shared memory regions
triton-client    | unregister_cuda_shared_memory, metadata ()
triton-client    | 
triton-client    | Unregistered all cuda shared memory regions
triton-client    | register_cuda_shared_memory, metadata ()
triton-client    | name: "output0_data"
triton-client    | raw_handle: "\260iu\001\000\000\000\000\001\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0001\000\000\000\000\000\000\000\242\000\320\301\216\000\000\\\000\000\000\000"
triton-client    | byte_size: 3276800
triton-client    | 
triton-client    | Traceback (most recent call last):
triton-client    |   File "simple_grpc_cudashm_client.py", line 61, in <module>
triton-client    |     triton_client.register_cuda_shared_memory(
triton-client    |   File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 906, in register_cuda_shared_memory
triton-client    |     raise_error_grpc(rpc_error)
triton-client    |   File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
triton-client    |     raise get_error_grpc(rpc_error) from None
triton-client    | tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] failed to register CUDA shared memory region 'output0_data'

The curious thing happens when I run the script from inside the container. If you run the container with docker-compose run triton-client bash and the from the terminal inside the container you execute `python3 simple_grpc_cudashm_client.py --verbose, the client works as expected without errors. This is the output generated in this case:

unregister_system_shared_memory, metadata ()

Unregistered all system shared memory regions
unregister_cuda_shared_memory, metadata ()

Unregistered all cuda shared memory regions
register_cuda_shared_memory, metadata ()
name: "output0_data"
raw_handle: " 3\225\001\000\000\000\000\t\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0001\000\000\000\000\000\000\000\304\000\320\301\216\000\000\\\000\000\000\000"
byte_size: 3276800

Registered cuda shared memory with name 'output0_data'
register_cuda_shared_memory, metadata ()
name: "input0_data"
raw_handle: " 3\225\001\000\000\000\000\t\000\000\000\000\000\000\000\000\0002\000\000\000\000\000 \003\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\0002\000\000\000\000\000\000\000\304\000\320\301\220\000\000\\\000\000\000\000"
byte_size: 3276800

Registered Cuda shared memory with name 'input0_data'

It’s important to notice that running with `docker-compose run triton-client python3 simple_grpc_cudashm_client.py --verbose also generates the same error.

Attachments

Script simple_grpc_cudashm_client.py

import os
import json
import sys
import argparse

import numpy as np
import tritonclient.grpc as grpcclient
from tritonclient import utils
import tritonclient.utils.cuda_shared_memory as cudashm

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v',
                        '--verbose',
                        action="store_true",
                        required=False,
                        default=False,
                        help='Enable verbose output')

     parser.add_argument('-u',
                        '--url',
                        type=str,
                        required=False,
                        default='localhost:8001',
                        help='Inference server URL. Default is localhost:8001.')

    FLAGS = parser.parse_args()

try:
        triton_client = grpcclient.InferenceServerClient(url=FLAGS.url,
                                                         verbose=FLAGS.verbose)
    except Exception as e:
        print("channel creation failed: " + str(e))
        sys.exit(1)


    triton_client.unregister_system_shared_memory()
    triton_client.unregister_cuda_shared_memory()

    model_name = "test"
    model_version = "latest"

    input_byte_size = 3276800 # 1600x512x4bytes
    output_byte_size = input_byte_size

    shm_op0_handle = cudashm.create_shared_memory_region(
        "output0_data", output_byte_size, 0)
    
    triton_client.register_cuda_shared_memory(
        "output0_data", cudashm.get_raw_handle(shm_op0_handle), 0,
        output_byte_size)
    
    shm_ip0_handle = cudashm.create_shared_memory_region(
        "input0_data", input_byte_size, 0)

    triton_client.register_cuda_shared_memory(
        "input0_data", cudashm.get_raw_handle(shm_ip0_handle), 0,
        input_byte_size)

Does anyone have an idea why this happens when launching the triton-client with docker-compose up?

Thanks

Issue Analytics

State:
Created 2 years ago
Comments:35 (14 by maintainers)

Top GitHub Comments

1reaction

Tabriziancommented, Jul 18, 2022

I was able to root cause this issue. It looks like the problem is that cudaIpcOpenMemHandle will return invalid context if the handle’s source PID matches the destination process PID. When using Docker, the process ID namespace of both of the container is enabled which will allow reuse of the process IDs. When the interactive mode is used, the PIDs will be different and there will be no issues.

The fix would be to add --pid host flag to both of the containers. I was able to reproduce the issue locally and confirm that adding the flag fixes the issue.

1reaction

msardoninicommented, Jun 6, 2022

Hi @rmccorm4

I’ve been able to reproduce this issue using the ‘simple’ model, running the docker image nvcr.io/nvidia/tritonserver:22.04-py3. It happens when I run both the server and client in non-interactive mode.

To reproduce, cd to server/docs/examples/model_repository (I ran fetch_models.sh instead of deleting the rest)

Start the triton-server

docker run --gpus all --network host -v $PWD:/mnt nvcr.io/nvidia/tritonserver:22.04-py3 tritonserver --model-repository /mnt

Start the triton-client test

docker run --gpus all --network host nvcr.io/nvidia/tritonserver:22.04-py3-sdk python3 /workspace/install/python/simple_grpc_cudashm_client.py

This fails with the error message:

Traceback (most recent call last):
  File "/workspace/install/python/simple_grpc_cudashm_client.py", line 93, in <module>
    triton_client.register_cuda_shared_memory(
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 1133, in register_cuda_shared_memory
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 62, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] failed to register CUDA shared memory region 'output0_data'

If I start either the server or the triton client in interactive mode, the test passes. For example, I could start the server like this:

docker run -ti --gpus all --network host -v $PWD:/mnt nvcr.io/nvidia/tritonserver:22.04-py3

# Now in docker container
root@<NAME>:/opt/tritonserver# tritonserver --model-repository=/mnt

I have verified this behavior using two machines I am working with.

If it helps, here’s my configuration: nvidia driver: NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 nvidia-docker2: 2.10.0-1 OS: Ubuntu20.04

Top Results From Across the Web

Docker fails to register cuda shared memory - TensorRT

Triton Server - failed to register CUDA shared memory region Environment Relevant Files Please attach or include links to any models, data, ...

Docker shared memory size out of bounds or unhandled ...

When trying to train a model with a single GPU, a shared memory size out of bounds error is thrown.

Nvidia Developement · Stay Hungry Stay Foolish

The shared-memory extensions allow a client to communicate input and output tensors by system or CUDA shared memory. Using shared memory instead of...

Docker - ArchWiki - Arch Linux

This preserves the ability to share volumes between containers. Enabling user namespace isolation has several limitations. Also, Kubernetes ...

cschranz/gpu-jupyter - Docker Image

Leverage Jupyter Notebooks with the power of your NVIDIA GPU via CUDA in ... Processes: | | GPU GI CI PID Type Process...