Deploy Detectron2 Mask R-CNN inside Triton
See original GitHub issueDescription
-
With Detectron2, I have trained the R-CNN Mask model, which is based on the following architecture: link to yaml file.
-
I converted my model to TorchScript format using script provided by Detectron2 team: link to script, so it is now in
.pt
format. -
I prepared the
config.pbtxt
file and created a model repository as described in your documentation and put config and trained model there.- Structure of model repo
models_torchscript └ mask_rcnn ├ config.pbtxt └ 1 └ model.pt
- Content of
config.pbtxt
name: "mask_rcnn" platform: "pytorch_libtorch" max_batch_size: 0 input [ { name: "INPUT__0" data_type: TYPE_FP32 dims: [1, 3, 800, 800] }, { name: "INPUT__1" data_type: TYPE_FP32 dims: [1, 1, 3] } ] output [ { name: "OUTPUT__0" data_type: TYPE_FP32 dims: [16] }, { name: "OUTPUT__1" data_type: TYPE_FP32 dims: [16] } ]
-
I deploy model inside the server with the command
docker run \ --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ -p8000:8000 -p8001:8001 -p8002:8002 \ -v $PWD/models_torchscript:/models \ nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver \ --model-repository=/models \ --strict-model-config=false \ --log-verbose=1
-
Server is starting (no errors)
-
When trying to infer using your client lib (sorry for code it is really quick and dirty script)
image = np.zeros((1, 3, 800, 800)).astype(np.float32)
im_info = np.float32((800, 800, 1))
im_info = np.reshape(im_info, (1, -1))
im_info = np.expand_dims(im_info, axis=0)
dtype = "FP32"
input_1 = httpclient.InferInput("INPUT__0", image.shape, dtype)
input_1.set_data_from_numpy(image, binary_data=False)
input_2 = httpclient.InferInput("INPUT__1", im_info.shape, dtype)
input_2.set_data_from_numpy(im_info, binary_data=False)
inputs = [input_1, input_2]
output_1 = httpclient.InferRequestedOutput("OUTPUT__0", binary_data=False, class_count=1)
output_2 = httpclient.InferRequestedOutput("OUTPUT__1", binary_data=False, class_count=1)
outputs = [output_1, output_2]
response = triton_client.infer(FLAGS.model_name, inputs, request_id=str("loool"), model_version=FLAGS.model_version, outputs=outputs)
- Get error
I0915 18:11:15.486762 1 libtorch_backend.cc:552] Running mask_rcnn_0_gpu0 with 1 requests
I0915 18:11:15.486818 1 pinned_memory_manager.cc:130] pinned memory allocation: size 7680000, addr 0x7fcfa8000090
I0915 18:11:15.488582 1 pinned_memory_manager.cc:130] pinned memory allocation: size 12, addr 0x7fcfa87530a0
I0915 18:11:15.490377 1 libtorch_backend.cc:776] Expected at most 2 argument(s) for operator 'forward', but received 3 argument(s). Declaration: forward(__torch__.detectron2.export.caffe2_modeling.___torch_mangle_857.Caffe2GeneralizedRCNN self, (Tensor, Tensor) argument_1) -> ((Tensor, Tensor, Tensor, Tensor))
Exception raised from checkAndNormalizeInputs at /tmp/pip-req-build-gk_ormv_/aten/src/ATen/core/function_schema_inl.h:245 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7fd0447df94b in /opt/tritonserver/lib/pytorch/libc10.so)
frame #1: <unknown function> + 0x82a067 (0x7fd0c19a7067 in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #2: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x2d (0x7fd0c3a91cad in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #3: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x109 (0x7fd0c3aa19d9 in /opt/tritonserver/lib/pytorch/libtorch_cpu.so)
frame #4: <unknown function> + 0x27fc87 (0x7fd0e3a7cc87 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #5: <unknown function> + 0x286e4d (0x7fd0e3a83e4d in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #6: <unknown function> + 0x98000 (0x7fd0e3895000 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #7: <unknown function> + 0xafaf7 (0x7fd0e38acaf7 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #8: <unknown function> + 0xbd6df (0x7fd0e27996df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #9: <unknown function> + 0x76db (0x7fd0e35e56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x3f (0x7fd0e1e56a3f in /lib/x86_64-linux-gnu/libc.so.6)
- I have no idea how to solve that issue. Could anybody help me out?
Triton Information
What version of Triton are you using?
20.08
Are you using the Triton container or did you build it yourself?
I’m using Triton container: nvcr.io/nvidia/tritonserver:20.08-py3
Issue Analytics
- State:
- Created 3 years ago
- Comments:25 (5 by maintainers)
Top GitHub Comments
Letter to the people from the future
Below you will find an approximate path to deploy Detectron2 Mask R-CNN inside the Triton Inference Server:
Train your Detectron2 Mask R-CNN model in Python
Convert Detectron2 model to TorchScript using this script
Original model in Detectron2 requires only an image tensor to make an inference. However model in TorchScript requires an additional tensor with information about the image dimensions. Those two arguments need to form a tuple -
Tuple[Tensor, Tensor]
. According to Detectron2 documentation:All converted models (the .pb files) take two input tensors: “data” is an NCHW image, and “im_info” is an Nx3 tensor consisting of (height, width, 1.0) for each image (the shape of “data” might be larger than that in “im_info” due to padding).
The problem is however, that Triton does not allow for passing
Tuple
as argument to neural network forward pass. As a workaround, we can wrap the model into other dummy model that will accept two separate arguments of type Tensor and build Tuple inside forward method.Create a model repository on the host machine, as described in the documentation. Put your output
model.pt
file in correct place in that folder structure.Create config.pbtxt file with the content below (model configuration documentation)
Run server:
Currently Tritonserver does not current support such complex structures.
The Libtorch (PyTorch) backend operates with the assumption that the inputs to the model are tensors and not tuple of tensors. I’d recommend you to build a wrapper around your model and trace it to produce a version of your model where the inputs are tensors instead of a tuple of tensors. (i.e. pass a 4D tensor and convert into tuple of 3D tensors inside model before passing to detectron2) PS: The above workaround worked for someone with MaskRCNN.
Closing this issue. Please re-open if the above WAR does not solve your problem.