Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deploy Detectron2 Mask R-CNN inside Triton

See original GitHub issue

Description

With Detectron2, I have trained the R-CNN Mask model, which is based on the following architecture: link to yaml file.
I converted my model to TorchScript format using script provided by Detectron2 team: link to script, so it is now in .pt format.

I prepared the config.pbtxt file and created a model repository as described in your documentation and put config and trained model there.

Structure of model repo

models_torchscript
 └ mask_rcnn
  ├ config.pbtxt
  └ 1
    └ model.pt

Content of config.pbtxt

name: "mask_rcnn"
platform: "pytorch_libtorch"
max_batch_size: 0
input [
    {
            name: "INPUT__0"
            data_type: TYPE_FP32
            dims: [1, 3, 800, 800]
    },
    {
            name: "INPUT__1"
            data_type: TYPE_FP32
            dims: [1, 1, 3]
    }
]
output [
    {
            name: "OUTPUT__0"
            data_type: TYPE_FP32
            dims: [16]
    },
    {
            name: "OUTPUT__1"
            data_type: TYPE_FP32
            dims: [16]
    }
]

I deploy model inside the server with the command

docker run \
--gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models_torchscript:/models \
nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver \
--model-repository=/models \
--strict-model-config=false \
--log-verbose=1

Server is starting (no errors)
When trying to infer using your client lib (sorry for code it is really quick and dirty script)

image = np.zeros((1, 3, 800, 800)).astype(np.float32)
    im_info = np.float32((800, 800, 1))
    im_info = np.reshape(im_info, (1, -1))
    im_info = np.expand_dims(im_info, axis=0)

    dtype = "FP32"

    input_1 = httpclient.InferInput("INPUT__0", image.shape, dtype)
    input_1.set_data_from_numpy(image, binary_data=False)

    input_2 = httpclient.InferInput("INPUT__1", im_info.shape, dtype)
    input_2.set_data_from_numpy(im_info, binary_data=False)

    inputs = [input_1, input_2]

    output_1 = httpclient.InferRequestedOutput("OUTPUT__0", binary_data=False, class_count=1)
    output_2 = httpclient.InferRequestedOutput("OUTPUT__1", binary_data=False, class_count=1)

    outputs = [output_1, output_2]

    response = triton_client.infer(FLAGS.model_name, inputs, request_id=str("loool"), model_version=FLAGS.model_version, outputs=outputs)

Get error

I0915 18:11:15.486762 1 libtorch_backend.cc:552] Running mask_rcnn_0_gpu0 with 1 requests
I0915 18:11:15.486818 1 pinned_memory_manager.cc:130] pinned memory allocation: size 7680000, addr 0x7fcfa8000090
I0915 18:11:15.488582 1 pinned_memory_manager.cc:130] pinned memory allocation: size 12, addr 0x7fcfa87530a0
I0915 18:11:15.490377 1 libtorch_backend.cc:776] Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.detectron2.export.caffe2_modeling.___torch_mangle_857.Caffe2GeneralizedRCNN self, (Tensor, Tensor) argument_1) -> ((Tensor, Tensor, Tensor, Tensor))
Exception raised from checkAndNormalizeInputs at /tmp/pip-req-build-gk_ormv_/aten/src/ATen/core/function_schema_inl.h:245 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7fd0447df94b in /opt/tritonserver/lib/pytorch/libc10.so)
frame #1: <unknown function> + 0x82a067 (0x7fd0c19a7067 in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #2: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x2d (0x7fd0c3a91cad in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #3: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x109 (0x7fd0c3aa19d9 in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #4: <unknown function> + 0x27fc87 (0x7fd0e3a7cc87 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #5: <unknown function> + 0x286e4d (0x7fd0e3a83e4d in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #6: <unknown function> + 0x98000 (0x7fd0e3895000 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #7: <unknown function> + 0xafaf7 (0x7fd0e38acaf7 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #8: <unknown function> + 0xbd6df (0x7fd0e27996df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #9: <unknown function> + 0x76db (0x7fd0e35e56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x3f (0x7fd0e1e56a3f in /lib/x86_64-linux-gnu/libc.so.6)

I have no idea how to solve that issue. Could anybody help me out?

Triton Information What version of Triton are you using? 20.08

Are you using the Triton container or did you build it yourself? I’m using Triton container: nvcr.io/nvidia/tritonserver:20.08-py3

Issue Analytics

State:
Created 3 years ago
Comments:25 (5 by maintainers)

Top GitHub Comments

16reactions

SkalskiPcommented, Sep 23, 2020

Letter to the people from the future

Below you will find an approximate path to deploy Detectron2 Mask R-CNN inside the Triton Inference Server:

Train your Detectron2 Mask R-CNN model in Python
Convert Detectron2 model to TorchScript using this script
Original model in Detectron2 requires only an image tensor to make an inference. However model in TorchScript requires an additional tensor with information about the image dimensions. Those two arguments need to form a tuple - Tuple[Tensor, Tensor]. According to Detectron2 documentation:

All converted models (the .pb files) take two input tensors: “data” is an NCHW image, and “im_info” is an Nx3 tensor consisting of (height, width, 1.0) for each image (the shape of “data” might be larger than that in “im_info” due to padding).

The problem is however, that Triton does not allow for passing Tuple as argument to neural network forward pass. As a workaround, we can wrap the model into other dummy model that will accept two separate arguments of type Tensor and build Tuple inside forward method.
```
class Wrapper(torch.nn.Module):
    def __init__(self):
        super(Wrapper, self).__init__()
        self.model = torch.jit.load(SOURCE_MODEL_PATH).to(device)

    def forward(self, x: torch.Tensor, y: torch.Tensor):
        return self.model.forward((x, y))

m = torch.jit.script(Wrapper())
m.save(TARGET_MODEL_PATH)
```
Create a model repository on the host machine, as described in the documentation. Put your output model.pt file in correct place in that folder structure.
```
models_torchscript
└─ mask_rcnn
   ├─ config.pbtxt
   ├─ 1
   │  └─ model.pt
   └─ 2
      └─ model.pt
```

Create config.pbtxt file with the content below (model configuration documentation)

name: "mask_rcnn"
platform: "pytorch_libtorch"
max_batch_size: 0
input [
    {
    	  name: "INPUT__0"
    	  data_type: TYPE_FP32
    	  dims: [1, 3, 800, 800]
    },
    {
    	  name: "INPUT__1"
    	  data_type: TYPE_FP32
    	  dims: [1, 3]
    }
]
output [
    {
    	  name: "OUTPUT__0"
    	  data_type: TYPE_FP32
    	  dims: [-1,4]
    },
    {
    	  name: "OUTPUT__1"
    	  data_type: TYPE_FP32
    	  dims: [-1]
    },
    {
    	  name: "OUTPUT__2"
    	  data_type: TYPE_FP32
    	  dims: [-1]
    },
    {
    	  name: "OUTPUT__3"
    	  data_type: TYPE_FP32
    	  dims: [-1,1,28,28]
    }
]

Run server:

docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models_torchscript/:/models \
nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver \
--model-repository=/models \
--strict-model-config=false \
--log-verbose=1

1reaction

CoderHamcommented, Sep 22, 2020

Currently Tritonserver does not current support such complex structures.

The Libtorch (PyTorch) backend operates with the assumption that the inputs to the model are tensors and not tuple of tensors. I’d recommend you to build a wrapper around your model and trace it to produce a version of your model where the inputs are tensors instead of a tuple of tensors. (i.e. pass a 4D tensor and convert into tuple of 3D tensors inside model before passing to detectron2) PS: The above workaround worked for someone with MaskRCNN.

Closing this issue. Please re-open if the above WAR does not solve your problem.