Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ONNX-exported torchvision FasterRCNN fails on inference request

See original GitHub issue

Description Internal ONNX error related to dims when running an ONNX-exported torchvision FasterRCNN on TRTIS. Error is as follows: [E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running ReduceMax node. Name:'' Status Message: /workspace/onnxruntime/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc:110 onnxruntime::common::Status onnxruntime::cuda::PrepareForReduce(onnxruntime::OpKernelContext*, bool, const std::vector<long int>&, const onnxruntime::Tensor**, onnxruntime::Tensor**, int64_t&, int64_t&, std::vector<long int>&, std::vector<long int>&, std::vector<long int>&) keepdims || dim != 0 was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}\nStacktrace:

TRTIS Information Running container /w tag nvcr.io/nvidia/tensorrtserver:20.02-py3

To Reproduce Model is a torchvision.models.detection.FasterRCNN exported as follows:

    outputs = ["boxes", "labels", "scores"]
    dynamic_axes_dict = {output_name: {0: "detections"}
                         for output_name in outputs}
    torch.onnx.export(model, images, os.path.join(output_dir, "model.onnx"),
                      export_params=True,        # store weights in the model file
                      do_constant_folding=True,  # const folding for optimization
                      opset_version=11,          # opset vers 11 req for maskrcnn
                      input_names=["images"],
                      output_names=outputs,
                      dynamic_axes=dynamic_axes_dict,
                      # keep_initializers_as_inputs=True,
                      verbose=True)

The exported model’s output has been validated against the native Pytorch checkpoint model’s output. Everything seems good when run locally in a onnxruntime session.

The TRTIS config is as follows:

name: "torch_detection_rcnn"
platform: "onnxruntime_onnx"
input [
  {
    name: "images"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 745, 1324 ]
  }
]
output [
  {
    name: "boxes"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  },
  {
    name: "labels"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "scores"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]

The model is loaded by TRTIS with no complaints, but the error occurs during inference request handling.

Expected behavior Given the fact that I’m able to validate the model & load it onto TRTIS just fine, I expected it to handle requests just fine.

I’m not sure where to go from here, so I had a few questions.

Is there a particular opset we should be exporting with? I’m using 11 due to torch necessitating it.
Should we be exporting it with all initializers kept as inputs? I tried this and ran into a different error that was more vague.

I’m at a bit of a roadblock so let me know if you need any more information. Also, please let me know if y’all have any suggestions for what to look into and experiment with from here.

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, Apr 15, 2020

Is this still not working? Is so please reopen.

0reactions

dsandiicommented, Apr 16, 2020

Apologies for the delayed response. This issue should remain closed, but I just wanted to update the info here.

I believe the problem here arises from the dynamic axes (for number of detections). It appears that something in the ONNX code doesn’t account for dynamic axes being potentially 0 (no detections).

I simply modified the graph before exporting to pad all output tensors to a fixed number of elements and was able to bypass this issue.

Top Results From Across the Web

ONNX-exported torchvision FasterRCNN fails on inference ...

The model is loaded by TRTIS with no complaints, but the error occurs during inference request handling. ... Given the fact that I'm...

Local inference using ONNX for AutoML image (v1)

In this guide, you'll learn how to use Python APIs for ONNX Runtime to make predictions on images for popular vision tasks. You...

Exporting FasterRCNN (fasterrcnn_resnet50_fpn) to ONNX

I am trying to export a fine tuned faster rcnn model to ONNX. For training I am following the torchvision object detection fine...

Error converting custom Faster-RCNN model to tensorrt. (int ...

import torch import torchvision import torch2trt import tensorrt as ... Traceback (most recent call last): File "inference.py", line 113, ...

Validation loss for pytorch Faster-RCNN - Stack Overflow

Right now RCNN in torchvision gives you either losses or outputs, ... Now you can get both outputs after a single inference run: ......

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

ONNX-exported torchvision FasterRCNN fails on inference request

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

error: creating server: INTERNAL - failed to load all models

TF 2.1 SavedModel Format : unexpected input format FORMAT_NONE, expecting FORMAT_NHWC or FORMAT_NCHW