question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ONNX export of FasterRCNN: inference fails when no detections are present

See original GitHub issue

🐛 Bug

I am running into a similar issue as the one reported in https://github.com/pytorch/vision/issues/2251 where my exported ONNX model fails to run inference when no detections are present, but for FasterRCNN instead of MaskRCNN.

Running inference on a random tensor that will not create detections results in a similar runtime exception:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1814' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc:351 onnxruntime::common::Status onnxruntime::cuda::PrepareForReduce(onnxruntime::OpKernelContext*, bool, const std::vector<long int>&, const onnxruntime::Tensor**, onnxruntime::Tensor**, int64_t&, int64_t&, std::vector<long int>&, std::vector<long int>&, std::vector<long int>&, int64_t&, int64_t&) keepdims || dim != 0 was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}

I updated my environment to use recent torch, torchvision, and onnxruntime versions as instructed in the closed issue, but I still hit the same runtime exception.

To Reproduce

Steps to reproduce the behavior (mostly copied from closed issue):

  1. Export a pretrained FasterRCNN model to ONNX:

    torch.onnx.export(
        model, 
        inputs, 
        onnx_model_filepath,
        opset_version=11,
        do_constant_folding=True,
        verbose=True,
        input_names=[
            "data"
        ],
        output_names=[
            "boxes", 
            "labels", 
            "scores"
        ],
        dynamic_axes={
            "data": [1, 2],
            "boxes": [0],
            "labels": [0],
            "scores": [0]
        }
    )
    
  2. Run inference on an image that will result in detections and see output without failure:

    ort_session = onnxruntime.InferenceSession(onnx_model_name)
    input_array = input_tensor.cpu().numpy()
    ort_inputs = {"data": input_array}
    ort_outputs = ort_session.run(None, ort_inputs)
    
  3. Run inference on an image that will not result in detections and hit the runtime exception provided above:

    random_tensor = torch.randn(input_tensor.shape)
    random_array = random_tensor.cpu().numpy()
    ort_inputs = {"data": random_array}
    ort_outputs = ort_session.run(None, ort_inputs)
    

Expected behavior

I am expecting output from the ONNX exported FasterRCNN that is similar to the output from the PyTorch version:

[{'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward>),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([], grad_fn=<IndexBackward>)}]

Environment

PyTorch version: 1.7.0a0+4102fbd
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: 
GPU 0: TITAN Xp
GPU 1: TITAN Xp
GPU 2: TITAN Xp

Nvidia driver version: 440.64.00
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.7.0a0+4102fbd
[pip3] torchvision==0.8.0a0+bea6127
[conda] magma-cuda101             2.5.2                         1    pytorch
[conda] mkl                       2020.1                      217  
[conda] mkl-include               2020.1                      219    conda-forge
[conda] numpy                     1.18.5           py38h8854b6b_0    conda-forge
[conda] torch                     1.7.0a0+4102fbd          pypi_0    pypi
[conda] torchvision               0.8.0a0+bea6127          pypi_0    pypi

!pip freeze | grep onnx

onnx==1.7.0
onnxruntime-gpu==1.3.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
zhiqwangcommented, Jul 4, 2020

@neginraoof I have tested it in torch 1.5.1 and torchvision 0.6.1 (aka 0.6.0a0+35d732a). The inference results of pytorch is equal with ort’s.

@drwaltman In an early June nightly version, there is a bug in the test_onnx.py unittest. Have updated it to nightly version in June 30 (At that time, the version of torch is 1.7.0.dev20200626 and torchvision is 0.8.0.dev20200629, ignore the time zones difference), it’s resolved.

2reactions
drwaltmancommented, Jul 3, 2020

Hi @zhiqwang and @fmassa, thanks for the responses!

Running inference on the random tensor is now working after updating torch to 1.7.0.dev20200626 and torchvision to 0.8.0.dev20200629! @zhiqwang how did you determine to use these two versions in particular?

Regarding the repro steps typo, I was using “data” instead of “input” for the input keys for local testing, but pasted incorrect code on accident (was trying to match the examples from the previous issue), my bad!

Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Object Detection Model (PyTorch) to ONNX:empty output by ...
The export in both setups is working correctly. But the second setup does not deliver the desired results after inference with the ONNX...
Read more >
How to improve the accuracy and recall of exported ONNX ...
The worst is when we just run it from our c# app. How can I improve the performance of this exported to ONNX...
Read more >
ONNX Runtime | onnxruntime
ONNX Runtime Inference powers machine learning models in key Microsoft products and services across Office, Azure, Bing, as well as dozens of community...
Read more >
Inference and train with existing models and standard datasets
For Python 3.7+, MMDetection also supports async interfaces. By utilizing CUDA streams, it allows not to block CPU on GPU bound inference code...
Read more >
Use pre-trained object detection TF2 models with TensorRT ...
pb in ONNX format, optimize it with TensorRT and perfom inference with TensorRT engine. I followed these steps: Download model from TF2 Model ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found