question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Yolov5s torchscript model shows pytorch backend bugs?

See original GitHub issue

Bug This is a bug shows on a exported yolov5s traced torchscript model on triton inference server.

Environment

  • OS: Ubuntu 20.04
  • GPU: RTX 3090

To Reproduce I first export the yolov5s model to torchscript with batch size 8, img size 320 with their models/export.py script. Then I use this model on triton inference server docker container with nvcr.io/nvidia/tritonserver:20.12-py3 image, with the following triton inference server configpb.txt When I inference to this model

name: "model"
platform: "pytorch_libtorch"
max_batch_size: 8

input {
    name: "input__0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [3,-1,-1]
}
output [
    {name: "output__0"
    data_type: TYPE_FP32
    dims: [3,-1,-1,-1]
    },
    {name: "output__1"
    data_type: TYPE_FP32
    dims: [3,-1,-1,-1]
    },
    {name: "output__2"
    data_type: TYPE_FP32
    dims: [3,-1,-1,-1]
    }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

Triton Information docker image nvcr.io/nvidia/tritonserver:20.12-py3

Description Output:

ferenceServerException: PyTorch execute failure: isTensor() INTERNAL ASSERT FAILED at "/opt/tritonserver/include/torch/ATen/core/ivalue_inl.h":137, please report a bug to PyTorch. Expected Tensor but got GenericList
Exception raised from toTensor at /opt/tritonserver/include/torch/ATen/core/ivalue_inl.h:137 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f1c9112c6cc in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: <unknown function> + 0x29346 (0x7f1c91678346 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #2: <unknown function> + 0x1320d (0x7f1c9166220d in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #3: <unknown function> + 0x18ee3 (0x7f1c91667ee3 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #4: TRITONBACKEND_ModelInstanceExecute + 0x387 (0x7f1c91669247 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #5: <unknown function> + 0x2da9b7 (0x7f1ce2ab79b7 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #6: <unknown function> + 0xf1240 (0x7f1ce28ce240 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #7: <unknown function> + 0xd6d84 (0x7f1ce2317d84 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x9609 (0x7f1ce27b2609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f1ce2005293 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Expected behavior I tested the torchscript model on normal python which will get a result of following in pytorch.

torch.Size([1, 3, 24, 40, 6])
torch.Size([1, 3, 12, 20, 6])
torch.Size([1, 3, 6, 10, 6])

However, when I inference with triton inference client, httpclient.InferenceServerClient it shows the above errors.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
CoderHamcommented, Jan 5, 2021

Since The LibTorch backend makes an assumption that causes it to only support each inputs/outputs via a single Tensor and not List of Tensors or GenericList, you must create a wrapper code around the Yolov5 model such that the input and output of the traced model conserves this assumption. This should be straightforward.

The PyTorch community uses a list of tensors instead of a single tensor as I/O for many detection models and this is a common issue we have seen. However, due to the lack of metadata available from the torchscript model, Triton must operate with the aforementioned assumption.

0reactions
QingYuan-Lcommented, Oct 28, 2021

@luvwinnie hi man, I solved the problem by modify the model forward, here,in yolov5, return x if self.training else (torch.cat(z, 1), x) to return x if self.training else torch.cat(z, 1) then export again. and you can change the batchsize to 1,so that the triton can send input data, and receive the single tensor

Read more comments on GitHub >

github_iconTop Results From Across the Web

Custom Yolov5s TorchScript model does not work on Android ...
I have custom model trained on yolov5s v5 and I converted it to torchscript.ptl using ultralytics export.py with code modification as told here....
Read more >
ultralytics/yolov5: v6.0 - YOLOv5n 'Nano' models, Roboflow ...
Nano models maintain the YOLOv5s depth multiple of 0.33 but reduce the ... PyTorch Hub cv2 .save() .show() bug fix by @glenn-jocher in ......
Read more >
Converting YOLOv5 PyTorch Model Weights to TensorFlow ...
Testing the YOLOv5 Model Weights Locally. This step is optional but recommended. In this short test, I'll show you how to feed your...
Read more >
PyTorch 1.11.0 Now Available - Exxact Corporation
getitem used to be quantized in FX Graph Mode Quantization , and it is no longer quantized. This won't break any models but...
Read more >
Import Torch Python
TorchScript is a way to create serializable and optimizable models from PyTorch code. functional as f # layers, activations and more import torch....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found