ONNX export of FasterRCNN: inference fails when no detections are present
See original GitHub issue🐛 Bug
I am running into a similar issue as the one reported in https://github.com/pytorch/vision/issues/2251 where my exported ONNX model fails to run inference when no detections are present, but for FasterRCNN instead of MaskRCNN.
Running inference on a random tensor that will not create detections results in a similar runtime exception:
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1814' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc:351 onnxruntime::common::Status onnxruntime::cuda::PrepareForReduce(onnxruntime::OpKernelContext*, bool, const std::vector<long int>&, const onnxruntime::Tensor**, onnxruntime::Tensor**, int64_t&, int64_t&, std::vector<long int>&, std::vector<long int>&, std::vector<long int>&, int64_t&, int64_t&) keepdims || dim != 0 was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}
I updated my environment to use recent torch, torchvision, and onnxruntime versions as instructed in the closed issue, but I still hit the same runtime exception.
To Reproduce
Steps to reproduce the behavior (mostly copied from closed issue):
-
Export a pretrained FasterRCNN model to ONNX:
torch.onnx.export( model, inputs, onnx_model_filepath, opset_version=11, do_constant_folding=True, verbose=True, input_names=[ "data" ], output_names=[ "boxes", "labels", "scores" ], dynamic_axes={ "data": [1, 2], "boxes": [0], "labels": [0], "scores": [0] } )
-
Run inference on an image that will result in detections and see output without failure:
ort_session = onnxruntime.InferenceSession(onnx_model_name) input_array = input_tensor.cpu().numpy() ort_inputs = {"data": input_array} ort_outputs = ort_session.run(None, ort_inputs)
-
Run inference on an image that will not result in detections and hit the runtime exception provided above:
random_tensor = torch.randn(input_tensor.shape) random_array = random_tensor.cpu().numpy() ort_inputs = {"data": random_array} ort_outputs = ort_session.run(None, ort_inputs)
Expected behavior
I am expecting output from the ONNX exported FasterRCNN that is similar to the output from the PyTorch version:
[{'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward>),
'labels': tensor([], dtype=torch.int64),
'scores': tensor([], grad_fn=<IndexBackward>)}]
Environment
PyTorch version: 1.7.0a0+4102fbd
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: TITAN Xp
GPU 1: TITAN Xp
GPU 2: TITAN Xp
Nvidia driver version: 440.64.00
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.7.0a0+4102fbd
[pip3] torchvision==0.8.0a0+bea6127
[conda] magma-cuda101 2.5.2 1 pytorch
[conda] mkl 2020.1 217
[conda] mkl-include 2020.1 219 conda-forge
[conda] numpy 1.18.5 py38h8854b6b_0 conda-forge
[conda] torch 1.7.0a0+4102fbd pypi_0 pypi
[conda] torchvision 0.8.0a0+bea6127 pypi_0 pypi
!pip freeze | grep onnx
onnx==1.7.0
onnxruntime-gpu==1.3.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
@neginraoof I have tested it in torch 1.5.1 and torchvision 0.6.1 (aka 0.6.0a0+35d732a). The inference results of pytorch is equal with ort’s.
@drwaltman In an early June nightly version, there is a bug in the test_onnx.py unittest. Have updated it to nightly version in June 30 (At that time, the version of torch is 1.7.0.dev20200626 and torchvision is 0.8.0.dev20200629, ignore the time zones difference), it’s resolved.
Hi @zhiqwang and @fmassa, thanks for the responses!
Running inference on the random tensor is now working after updating torch to 1.7.0.dev20200626 and torchvision to 0.8.0.dev20200629! @zhiqwang how did you determine to use these two versions in particular?
Regarding the repro steps typo, I was using “data” instead of “input” for the input keys for local testing, but pasted incorrect code on accident (was trying to match the examples from the previous issue), my bad!
Thanks again!