question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

onnx export model differs too much from pytorch model

See original GitHub issue

Thanks for your error report and we appreciate it a lot.

Describe the bug I trained faster-rcnn on a custom coco dataset and exported it to onnx using pytorch2onnx.py. The onnx model results differs too much from pytorch model.

I retried training and exporting using retinanet and this problem did not occur?

Reproduction

  1. What command or script did you run?
python /home/jovyan/work/mmdetection/tools/train.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py
python /home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/epoch_12.pth\
--output-file /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/tmp.onnx --verify\
--input-img /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/472613.jpg

I provide the links to the config, ckpt , log and onnx file in this link.

Environment

2021-09-20 10:51:43,820 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] CUDA available: True GPU 0,1: NVIDIA GeForce GTX TITAN X CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29745058_0 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.9.0a0+2ecb2c7 PyTorch compiling details: PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel® Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel® 64 architecture applications
  • Intel® MKL-DNN v1.8.0 (Git Hash N/A)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0a0 OpenCV: 4.5.3 MMCV: 1.3.13 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.16.0+0d66ba3

Error traceback If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 325, in <module>
    pytorch2onnx(
  File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 197, in pytorch2onnx
    np.testing.assert_allclose(
  File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1528, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05
The numerical values are different between Pytorch and ONNX, but it does not necessarily mean the exported ONNX model is problematic.
(shapes (4, 5), (0, 5) mismatch)
 x: array([[-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.]], dtype=float32)
 y: array([], shape=(0, 5), dtype=float32)

onnx results show-ort pytorch model results show-pt

Bug fix

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
RunningLeoncommented, Sep 22, 2021

@Hussni-V Could try with PyTorch==1.8.0 and ONNXRuntime==1.8.0. Tested OK on my machine

sys.platform: linux
Python: 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2080
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.0
OpenCV: 4.5.2
MMCV: 1.3.13
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.16.0+5ef56c1

ONNXRuntime

1reaction
RunningLeoncommented, Sep 24, 2021

@jshilong We have in the doc: https://github.com/open-mmlab/mmdetection/blob/master/docs/tutorials/pytorch2onnx.md#list-of-supported-models-exportable-to-onnx

Notes:

Minimum required version of MMCV is 1.3.5

All models above are tested with Pytorch==1.6.0 and onnxruntime==1.5.1, except for CornerNet. For more details about the torch version when exporting CornerNet to ONNX, which involves mmcv::cummax, please refer to the Known Issues in mmcv.

Read more comments on GitHub >

github_iconTop Results From Across the Web

(optional) Exporting a Model from PyTorch to ONNX and ...
In this tutorial, we describe how to convert a model defined in PyTorch into the ONNX format and then run it with ONNX...
Read more >
outputs are different between ONNX and pytorch
I try to convert my pytorch Resnet50 model to ONNX and do inference. The conversion procedural makes no errors, but the final result...
Read more >
How to Convert a PyTorch Model to ONNX in 5 Minutes - Deci AI
Start by loading a pre-trained ResNet-50 model from PyTorch's model hub to your computer. The model conversion process requires the following:
Read more >
Export from PyTorch | Docs - Snap Inc.
Select your ONNX file that you've exported previously and if everything is fine, the studio will prompt you to set your model's input...
Read more >
torch.onnx — PyTorch master documentation
This means that if your model is dynamic, e.g., changes behavior depending on input data, the export won't be accurate. Similarly, a trace...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found