onnx export model differs too much from pytorch model
See original GitHub issueThanks for your error report and we appreciate it a lot.
Describe the bug I trained faster-rcnn on a custom coco dataset and exported it to onnx using pytorch2onnx.py. The onnx model results differs too much from pytorch model.
I retried training and exporting using retinanet and this problem did not occur?
Reproduction
- What command or script did you run?
python /home/jovyan/work/mmdetection/tools/train.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py
python /home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/faster.py\
/home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/epoch_12.pth\
--output-file /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/tmp.onnx --verify\
--input-img /home/jovyan/work/vdetection/work_dirs/chess_faster_mmdet2/472613.jpg
I provide the links to the config, ckpt , log and onnx file in this link.
Environment
2021-09-20 10:51:43,820 - mmdet - INFO - Environment info:
sys.platform: linux Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] CUDA available: True GPU 0,1: NVIDIA GeForce GTX TITAN X CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29745058_0 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.9.0a0+2ecb2c7 PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel® Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel® 64 architecture applications
- Intel® MKL-DNN v1.8.0 (Git Hash N/A)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.9.0a0 OpenCV: 4.5.3 MMCV: 1.3.13 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.16.0+0d66ba3
Error traceback If applicable, paste the error trackback here.
Traceback (most recent call last):
File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 325, in <module>
pytorch2onnx(
File "/home/jovyan/work/mmdetection/tools/deployment/pytorch2onnx.py", line 197, in pytorch2onnx
np.testing.assert_allclose(
File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1528, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/opt/conda/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.001, atol=1e-05
The numerical values are different between Pytorch and ONNX, but it does not necessarily mean the exported ONNX model is problematic.
(shapes (4, 5), (0, 5) mismatch)
x: array([[-2., -2., -2., -2., 0.],
[-2., -2., -2., -2., 0.],
[-2., -2., -2., -2., 0.],
[-2., -2., -2., -2., 0.]], dtype=float32)
y: array([], shape=(0, 5), dtype=float32)
onnx results
pytorch model results
Bug fix
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
@Hussni-V Could try with PyTorch==1.8.0 and ONNXRuntime==1.8.0. Tested OK on my machine
@jshilong We have in the doc: https://github.com/open-mmlab/mmdetection/blob/master/docs/tutorials/pytorch2onnx.md#list-of-supported-models-exportable-to-onnx
Minimum required version of MMCV is 1.3.5
All models above are tested with Pytorch==1.6.0 and onnxruntime==1.5.1, except for CornerNet. For more details about the torch version when exporting CornerNet to ONNX, which involves mmcv::cummax, please refer to the Known Issues in mmcv.