ONNX tests failing on master
See original GitHub issue🐛 Bug
I seems that the ONNX tests are failing today on the latest master and the problem is probably related to changes upstream.
This was originally spotted on an unrelated PR but to confirm we reran the tests on previously day’s passing master and it failed with the following errors:
======================================================================
ERROR: test_faster_rcnn (__main__.ONNXExporterTester)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_onnx.py", line 376, in test_faster_rcnn
tolerate_small_mismatch=True)
File "test/test_onnx.py", line 53, in run_model
self.ort_validate(onnx_io, test_inputs, test_ouputs, tolerate_small_mismatch)
File "test/test_onnx.py", line 72, in ort_validate
ort_outs = ort_session.run(None, ort_inputs)
File "/home/circleci/.local/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1833' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc:487 void onnxruntime::CommonReduce(onnxruntime::OpKernelContext*, std::vector<long int>, int64_t, onnxruntime::ResultsNoTransposePrepareForReduce&, bool) [with T = float; AGG = onnxruntime::ReduceAggregatorMax<float, float>; int64_t = long int] keepdims_ was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}
======================================================================
ERROR: test_keypoint_rcnn (__main__.ONNXExporterTester)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_onnx.py", line 477, in test_keypoint_rcnn
tolerate_small_mismatch=True)
File "test/test_onnx.py", line 53, in run_model
self.ort_validate(onnx_io, test_inputs, test_ouputs, tolerate_small_mismatch)
File "test/test_onnx.py", line 72, in ort_validate
ort_outs = ort_session.run(None, ort_inputs)
File "/home/circleci/.local/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1833' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc:487 void onnxruntime::CommonReduce(onnxruntime::OpKernelContext*, std::vector<long int>, int64_t, onnxruntime::ResultsNoTransposePrepareForReduce&, bool) [with T = float; AGG = onnxruntime::ReduceAggregatorMax<float, float>; int64_t = long int] keepdims_ was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}
======================================================================
ERROR: test_mask_rcnn (__main__.ONNXExporterTester)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_onnx.py", line 429, in test_mask_rcnn
tolerate_small_mismatch=True)
File "test/test_onnx.py", line 53, in run_model
self.ort_validate(onnx_io, test_inputs, test_ouputs, tolerate_small_mismatch)
File "test/test_onnx.py", line 72, in ort_validate
ort_outs = ort_session.run(None, ort_inputs)
File "/home/circleci/.local/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'ReduceMax_1833' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc:487 void onnxruntime::CommonReduce(onnxruntime::OpKernelContext*, std::vector<long int>, int64_t, onnxruntime::ResultsNoTransposePrepareForReduce&, bool) [with T = float; AGG = onnxruntime::ReduceAggregatorMax<float, float>; int64_t = long int] keepdims_ was false. Can't reduce on dim with value of 0 if 'keepdims' is false. Invalid output shape would be produced. input_shape:{0,4}
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
TensorRT/ONNX
Some performance tests about dynamic shape with onnx model. Test environment. GPU: T4 TensorRT: 7.0. CUDA: 10.2. MobilenetV2 ...
Read more >ONNX Runtime Addition, master branch (2020.03.02.)
An error occurred while retrieving approval data for this merge request. ONNX Runtime Addition, master branch (2020.03.02.).
Read more >Error on running Super Resolution Model from ONNX
Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed. How can I solve the error? python ...
Read more >Onnx to tensorrt conversion fails
Description. trtexec --onnx=my_model.onnx --batch=1 --saveEngine=test.engine --verbose fails with the below error
Read more >Deploy and make predictions with ONNX - SQL machine ...
In this article. Before you begin; Train a pipeline; Convert the model to ONNX; Test the ONNX model ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@datumbox I have a PR to fix this issue on upstream: https://github.com/pytorch/pytorch/pull/50582 I imported the above three torch vision test into pytorch test, and it passed locally, and torchvision test looks good “mostly” (see detail [A] below)
It need some time for this PR get merged. For current policy with Facebook, we merge to pytorch branch when we have ~10 PRs in a batch. So we estimate this PR merge may happen in around 10-14 days. That means torch_vision test_onnx will still be red during this time. Do you have any comments on this? Thanks.
Detail [A]: When I test torch vision test against this PR, it passed test_faster_rcnn and test_mask_rcnn, fails on test_keypoint_rcnn on a single data point out of 561: With rtol=0.001 and atol=1e-05, found 1 element(s) (out of 561) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 2.6005320250988007e-05 (-0.014647360891103745 vs. -0.014621355570852757), which occurred at index (29, 4).
The difference is around rtol=0.0017 and atol = 2.7e-5, slightly larger than the bound rtol=0.001 and atol=1e-05. I feel it is acceptable - we can relax the error bar to unblock torch vision UT. Further analysis is a separate issue.
@jiafatom Thanks for looking into it.
We are currently completing the work of including FasterRCNN with MobileNetV3 backbone (#3253). Given that this bug affects the tests of *rcnn models, it makes it hard to confirm that the new model will be ONNX compatible. I wonder if your team could bring the PR faster as an exception for this use-case?