question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ONNX-exported torchvision FasterRCNN fails on inference request

See original GitHub issue

Description Internal ONNX error related to dims when running an ONNX-exported torchvision FasterRCNN on TRTIS. Error is as follows: [E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running ReduceMax node. Name:'' Status Message: /workspace/onnxruntime/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc:110 onnxruntime::common::Status onnxruntime::cuda::PrepareForReduce(onnxruntime::OpKernelContext*, bool, const std::vector<long int>&, const onnxruntime::Tensor**, onnxruntime::Tensor**, int64_t&, int64_t&, std::vector<long int>&, std::vector<long int>&, std::vector<long int>&) keepdims || dim != 0 was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}\nStacktrace:

TRTIS Information Running container /w tag nvcr.io/nvidia/tensorrtserver:20.02-py3

To Reproduce Model is a torchvision.models.detection.FasterRCNN exported as follows:

    outputs = ["boxes", "labels", "scores"]
    dynamic_axes_dict = {output_name: {0: "detections"}
                         for output_name in outputs}
    torch.onnx.export(model, images, os.path.join(output_dir, "model.onnx"),
                      export_params=True,        # store weights in the model file
                      do_constant_folding=True,  # const folding for optimization
                      opset_version=11,          # opset vers 11 req for maskrcnn
                      input_names=["images"],
                      output_names=outputs,
                      dynamic_axes=dynamic_axes_dict,
                      # keep_initializers_as_inputs=True,
                      verbose=True)

The exported model’s output has been validated against the native Pytorch checkpoint model’s output. Everything seems good when run locally in a onnxruntime session.

The TRTIS config is as follows:

name: "torch_detection_rcnn"
platform: "onnxruntime_onnx"
input [
  {
    name: "images"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 745, 1324 ]
  }
]
output [
  {
    name: "boxes"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  },
  {
    name: "labels"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "scores"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]

The model is loaded by TRTIS with no complaints, but the error occurs during inference request handling.

Expected behavior Given the fact that I’m able to validate the model & load it onto TRTIS just fine, I expected it to handle requests just fine.

I’m not sure where to go from here, so I had a few questions.

  • Is there a particular opset we should be exporting with? I’m using 11 due to torch necessitating it.
  • Should we be exporting it with all initializers kept as inputs? I tried this and ran into a different error that was more vague.

I’m at a bit of a roadblock so let me know if you need any more information. Also, please let me know if y’all have any suggestions for what to look into and experiment with from here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
deadeyegoodwincommented, Apr 15, 2020

Is this still not working? Is so please reopen.

0reactions
dsandiicommented, Apr 16, 2020

Apologies for the delayed response. This issue should remain closed, but I just wanted to update the info here.

I believe the problem here arises from the dynamic axes (for number of detections). It appears that something in the ONNX code doesn’t account for dynamic axes being potentially 0 (no detections).

I simply modified the graph before exporting to pad all output tensors to a fixed number of elements and was able to bypass this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ONNX-exported torchvision FasterRCNN fails on inference ...
The model is loaded by TRTIS with no complaints, but the error occurs during inference request handling. ... Given the fact that I'm...
Read more >
Local inference using ONNX for AutoML image (v1)
In this guide, you'll learn how to use Python APIs for ONNX Runtime to make predictions on images for popular vision tasks. You...
Read more >
Exporting FasterRCNN (fasterrcnn_resnet50_fpn) to ONNX
I am trying to export a fine tuned faster rcnn model to ONNX. For training I am following the torchvision object detection fine...
Read more >
Error converting custom Faster-RCNN model to tensorrt. (int ...
import torch import torchvision import torch2trt import tensorrt as ... Traceback (most recent call last): File "inference.py", line 113, ...
Read more >
Validation loss for pytorch Faster-RCNN - Stack Overflow
Right now RCNN in torchvision gives you either losses or outputs, ... Now you can get both outputs after a single inference run: ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found