Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

onnx export model differs too much from pytorch model

See original GitHub issue

Thanks for your error report and we appreciate it a lot.

Describe the bug I trained faster-rcnn on a custom coco dataset and exported it to onnx using pytorch2onnx.py. The onnx model results differs too much from pytorch model.

I retried training and exporting using retinanet and this problem did not occur?

Reproduction

What command or script did you run?

python /home/jovyan/work/mmdetection/tools/train.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py

python /home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/epoch_12.pth\
--output-file /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/tmp.onnx --verify\
--input-img /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/472613.jpg

I provide the links to the config, ckpt , log and onnx file in this link.

Environment

2021-09-20 10:51:43,820 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] CUDA available: True GPU 0,1: NVIDIA GeForce GTX TITAN X CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29745058_0 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.9.0a0+2ecb2c7 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel® Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel® 64 architecture applications
Intel® MKL-DNN v1.8.0 (Git Hash N/A)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0a0 OpenCV: 4.5.3 MMCV: 1.3.13 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.16.0+0d66ba3

Error traceback If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 325, in <module>
    pytorch2onnx(
  File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 197, in pytorch2onnx
    np.testing.assert_allclose(
  File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1528, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05
The numerical values are different between Pytorch and ONNX, but it does not necessarily mean the exported ONNX model is problematic.
(shapes (4, 5), (0, 5) mismatch)
 x: array([[-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.],
       [-2., -2., -2., -2.,  0.]], dtype=float32)
 y: array([], shape=(0, 5), dtype=float32)

onnx results show-ort pytorch model results show-pt

Bug fix

Issue Analytics

State:
Created 2 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

4reactions

RunningLeoncommented, Sep 22, 2021

@Hussni-V Could try with PyTorch==1.8.0 and ONNXRuntime==1.8.0. Tested OK on my machine

sys.platform: linux
Python: 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2080
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.0
OpenCV: 4.5.2
MMCV: 1.3.13
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.16.0+5ef56c1

ONNXRuntime

1reaction

RunningLeoncommented, Sep 24, 2021

@jshilong We have in the doc: https://github.com/open-mmlab/mmdetection/blob/master/docs/tutorials/pytorch2onnx.md#list-of-supported-models-exportable-to-onnx

Notes:

Minimum required version of MMCV is 1.3.5

All models above are tested with Pytorch==1.6.0 and onnxruntime==1.5.1, except for CornerNet. For more details about the torch version when exporting CornerNet to ONNX, which involves mmcv::cummax, please refer to the Known Issues in mmcv.