`find_unused_parameters` after several epoch training when training YOLOX
See original GitHub issueThanks for your error report and we appreciate it a lot.
Checklist
- I have searched related issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
Describe the bug
When training YOLOX, there is a find_unused_parameters
after several epoch training
And I follow link1 that set detect_anomalous_params=True
After that, there produce some hint said the weight and bias of multi_level_conv_reg
does not join the loss computation, it is really weird. All I change is using my dataset and it’s fine when I train a Faster RCNN
Reproduction
- What command or script did you run?
python ./tools/train.py ./configs/alpha_mot0220/yolox_s_8x8_300e_coco_car.py
-
Did you make any modifications on the code or config? Did you understand what you have modified?
-
What dataset did you use? a own dataset that using COCO format Environment
-
Please run
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: False
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.10.0
OpenCV: 4.5.5
MMCV: 1.4.6
MMCV Compiler: GCC 7.4
MMCV CUDA Compiler: 10.2
MMDetection: 2.21.0+e359d3f
Error traceback If applicable, paste the error trackback here.
A placeholder for trackback.
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Issue Analytics
- State:
- Created 2 years ago
- Comments:18
Sorry for late. I found the problem is thers is some annotation is empty on my own data. And it may make
reg_conv
not participate in the gradient propogation. In other word, SimOTA cannot deal with some img without gt_label.And yolox seems to have issue at testing stage as well. I will find out what’s wrong recentelly.
The mmdet 3.x/mmyolo branch has been fixed. Please use the latest version