question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`find_unused_parameters` after several epoch training when training YOLOX

See original GitHub issue

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug When training YOLOX, there is a find_unused_parameters after several epoch training And I follow link1 that set detect_anomalous_params=True After that, there produce some hint said the weight and bias of multi_level_conv_reg does not join the loss computation, it is really weird. All I change is using my dataset and it’s fine when I train a Faster RCNN

Reproduction

  1. What command or script did you run?
python ./tools/train.py ./configs/alpha_mot0220/yolox_s_8x8_300e_coco_car.py 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

  2. What dataset did you use? a own dataset that using COCO format Environment

  3. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: False
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0
OpenCV: 4.5.5
MMCV: 1.4.6
MMCV Compiler: GCC 7.4
MMCV CUDA Compiler: 10.2
MMDetection: 2.21.0+e359d3f

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:18

github_iconTop GitHub Comments

1reaction
igo312commented, Mar 10, 2022

Sorry for late. I found the problem is thers is some annotation is empty on my own data. And it may make reg_conv not participate in the gradient propogation. In other word, SimOTA cannot deal with some img without gt_label.

And yolox seems to have issue at testing stage as well. I will find out what’s wrong recentelly.

0reactions
hhaAndroidcommented, Oct 25, 2022

The mmdet 3.x/mmyolo branch has been fixed. Please use the latest version

Read more comments on GitHub >

github_iconTop Results From Across the Web

YOLOX Object Detector Paper Explanation and Custom Training
YOLOX object detector is a single-stage real-time detector. Check out detailed explanation of YOLOX paper and training YOLOX on custom data.
Read more >
[1906.06669] One Epoch Is All You Need - arXiv
Under one epoch training, no overfitting occurs, and regularization method does nothing but slows down the training. Also, the curve of test ...
Read more >
How to Train YOLOX on a Custom Dataset - YouTube
Happy training ! Subscribe: https://bit.ly/rf-yt-subPublic Example Dataset ...
Read more >
Custom Dataset Training using MMDetection - DebuggerCafe
In this post, we will be training MMDetection on a custom dataset and carrying out inference using the trained YOLOX model.
Read more >
What is Epoch in Machine Learning?(2023) | UNext Jigsaw
Typical values of the number of epochs when training algorithms can ... of searching happens over and over again in discrete multiple steps ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found