Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to deploy Detectron2 model using pytorch?

See original GitHub issue

I export detectron2 model to torchscript and try to deploy in triton server *I use detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

my config.pbtxt is below

name: "testmodel"
platform: "pytorch_libtorch"
max_batch_size: 0
input [
  {
    name: "conv2d__0"
    data_type: TYPE_FP32
    dims: [1056, 1920,3]
  }
]
output [
  {
    name: "bboxex__0"
    data_type: TYPE_FP32
    dims: [-1,4]
  },
  {
    name: "scores__1"
    data_type: TYPE_FP32
    dims: [-1]
  },
  {
    name: "classes__2"
    data_type: TYPE_INT32
    dims: [-1]
  },
  {
    name: "masks__3"
    data_type: TYPE_BOOL
    dims: [-1]
  }
]

I inference to model but I receive an error

File "code/__torch__/detectron2/modeling/meta_arch/rcnn.py", line 348, in forward
    _9 = torch.slice(max_size, 0, -2, 9223372036854775807, 1)
    _10 = torch.add(_9, CONSTANTS.c2, alpha=1)
    _11 = torch.floor_divide(_10, CONSTANTS.c3)
          ~~~~~~~~~~~~~~~~~~ <--- HERE
    max_size0 = torch.cat([_8, torch.mul(_11, CONSTANTS.c3)], 0)
    h = ops.prim.NumToTensor(torch.size(t, 1))
Traceback of TorchScript, original code (most recent call last):
/usr/local/lib/python3.6/dist-packages/torch/tensor.py(424): __floordiv__
/usr/local/lib/python3.6/dist-packages/torch/tensor.py(22): wrapped
/usr/local/lib/python3.6/dist-packages/detectron2/structures/image_list.py(98): from_tensors
/usr/local/lib/python3.6/dist-packages/detectron2/modeling/meta_arch/rcnn.py(222): preprocess_image
/usr/local/lib/python3.6/dist-packages/detectron2/modeling/meta_arch/rcnn.py(196): inference
/usr/local/lib/python3.6/dist-packages/detectron2/modeling/meta_arch/rcnn.py(149): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(704): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(720): _call_impl
/root/detectron2/tools/deploy/mymodel.py(70): forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(704): _slow_forward
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py(720): _call_impl
/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py(1109): trace_module
/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py(955): trace
<stdin>(1): 
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

before this error I receive any times same device error and I modify cpu operations to gpu operations like .cuda()

How should I solve this problem? Or is there a way to deploy the detectron2 model?

thank you for reading

Issue Analytics

State:
Created 3 years ago
Comments:24 (7 by maintainers)

Top GitHub Comments

10reactions

CescMessicommented, Nov 19, 2021

I had the same problem and found the solution. This problem is indeed caused by the tensor max_size, which is a tensor in cpu. When I modified the detectron2 code to move the variable to the gpu, Triton worked fine. The code is here, I just added .to('cuda') to make it work. Another solution is to modify the torchscript code. Just unzip torchscript model file and modify the corresponding code. In FasterRCNN model, the code is in archive/code/__torch__/detectron2/export/flatten.py, and add a line max_size = torch.to(max_size, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=None, non_blocking=False, copy=False, memory_format=None) after max_size appear. Then zip the folder, it can also work in Triton.

I don’t know if the problem comes from detectron2 or Triton, although the problem was solved by modifying the detectron2 code, the original torchscript model works fine in pytorch. I hope the bug can be fixed soon. @stella-ds @CoderHam

2reactions

CoderHamcommented, Oct 9, 2020

There will be a fix for the torchvision build in the upcoming release that will allow you to run Detectron 2 successfully.