ONNX export of MaskRCNN: dynamic axes seem broken (for batch size > 1)
See original GitHub issue🐛 Bug
At export to ONNX, dynamic axes were set and the inputs and outputs named properly. However, the output of inferred images is incorrect and wrongly named. Depending on different batch sizes used at export and inference, the behaviour varies as follows:
Supposing that batch size at export time is n
, and batch size at inference time is m
:
- if
n
==m
: Output has length ofn*4
, so ex. ifn=m=3
, output has length of 12. In the onnx runtime session, it looks like in the following screenshot:So only output of the first image in batch is correctly named.
- if
n
<m
: Similar behaviour as in 1., but output for onlyn
first images in batch is returned. - if
n
>m
: A “SplitToSequence_XXXX” error is returned, ex. that one:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SplitToSequence node. Name:'SplitToSequence_4001' Status Message: split_size_sum (57) != split_dim_size (40)
This exception is similar to behaviour listed in #2309 and seems connected.
To Reproduce
Steps to reproduce the behavior:
1.
Load and export a pretrained MaskRCNN model using input_tensor
of shape (n, 3, 1024, 1024), for example (4,3,1024,1024):
model = torchvision.models.detection.maskrcnn_resnet50_fpn(
pretrained=False,
min_size=1024, max_size=1024,
pretrained_backbone=False,
num_classes=num_classnames + 1, # + background class
image_mean=image_mean,
image_std=image_std,
)
torch.onnx.export(
model,
input_tensor.float(),
onnx_model_filepath,
export_params=True,
opset_version=12,
do_constant_folding=False,
input_names=["images_tensors"],
output_names=["boxes", "labels", "scores", "masks"],
dynamic_axes={"images_tensors": [0, 1, 2, 3], "boxes": [0, 1], "labels": [0],
"scores": [0], "masks": [0, 1, 2, 3]},
)
- Load and infer ONNX model on
input_tensor
of shape (m,3,1024,1024), wherem
corresponds to the value in the description above, and differentm
values (bigger, smaller or equal ton
) will result in different behaviours.
input_array = input_tensor.cpu().numpy()
ort_session = onnxruntime.InferenceSession(onnx_model_filepath)
ort_inputs = {"images_tensors": input_array}
ort_outs = ort_session.run(None, ort_inputs)
outputs = ort_session.get_outputs()
These outputs are presented in the screenshot above.
Expected behavior
With dynamic_axes set properly, I expect:
- Output length dependent on batch size of the inferred tensor, not the one used for export.
- Output of shape similar to the torch model’s output, which is a list (len == batch size) of dictionaries of
boxes
,labels
,scores
andmasks
. Also, all outputs correctly named, not like currently in the above screenshot. - No exceptions if inferred tensor is of smaller batch size than the one used for export.
Environment
PyTorch version: 1.6.0.dev20200526+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 20.04 LTS
GCC version: (Ubuntu 9.3.0-10ubuntu2) 9.3.0
CMake version: version 3.16.3
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti
Nvidia driver version: 440.64
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4
Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.6.0.dev20200526+cu101
[pip3] torchvision==0.7.0.dev20200526+cu101
[conda] Could not collect
Also:
ONNX_runtime and ONNX_runtime_gpu==1.3.0
ONNX==1.7.0
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Torch.onnx.export with dynamic size for craft - TensorRT
Description i produced pth model and then onnx with dynnamic axes but when i want to build an trt engine from it i...
Read more >mxnet Changelog - pyup.io
Backport Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (18676) (18890) ... Fixing ONNX spatial export for batchnorm (17711) (18846)...
Read more >Deep Learning with PyTorch
1 Introducing deep learning and the PyTorch Library 3 ... This leaves us with taking a closer look at this training thing we...
Read more >v2.24.1 PDF - MMDetection's documentation!
scnet_x101_64x4d_fpn_8x1_20e_coco.py which batch size is 8 x 1 = 8. ... --dynamic-export: Determines whether to export ONNX model with ...
Read more >Getting started with Mask R-CNN in Keras - Gilbert Tanner
CocoConfig): # Set batch size to 1 since we'll be running inference on # one image at a time. Batch size = GPU_COUNT...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Same problem. Is there any plan to support dynamic batch inference now?
@neginraoof
Has this been addressed, by any chance? I am trying to play around with
dynamic_axes
with torchvision models (faster_rcnn
,mask_rcnn
, etc) and I cannot seem to get it to work. I asked on the forums too, with no luck.