Error during training (Assertion input_val >= zero && input_val <= one failed.)
See original GitHub issueProblem
thank you for contribution, I encountered gradient exploding during training the model tood_r50_fpn_1x_coco.
-
I tried to train this model in Mix-Precision Training strategy, and the loss scale was set ‘dynamic’. The training soon stopped, and raise RuntimeError: CUDA error: device-side assert triggered.
-
I also retrained the model with FP32 precision, but it did not work.
-
A lower lr did not address gradient exploding.
-
Gradient cutting helps avoid training failure (Mix-Precision Training, loss scale=512.) , but the model can not converge.
I try to google this issue. I think it is not OOM. It seems to relate with the NaN value in prediction head and further cause the error at calculating loss. I do not know if the environment(mmdet-1.15.0) affects with training.
My modification
- I port the TOOD code to my working environment (MMDet-1.15.0), without edit.
- I edit the training config to train my own dataset.
Environment
2021-12-09 16:50:01,643 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2070
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.4.r11.4/compiler.30033411_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.15.0+87eda06
------------------------------------------------------------
Error Report
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
File "tools/train.py", line 188, in <module>
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
return old_func(*args, **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 171, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 83, in forward_train
gt_labels, gt_bboxes_ignore)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 185, in new_func
return old_func(*args, **kwargs)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 426, in loss
num_total_samples=num_total_samples)
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 333, in loss_single
& (labels < bg_class_ind)).nonzero().squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f12c21efa22 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10ac3 (0x7f12c2451ac3 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7f12c2453167 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f12c21d95a4 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0xa2bb12 (0x7f133bad0b12 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xa2bbb1 (0x7f133bad0bb1 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: __libc_start_main + 0xe7 (0x7f1376d75bf7 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
You can try to clamp the value of the box area when computing GIoU loss, e.g., https://github.com/fcjian/TOOD/blob/93b3a87556e361f7d56507bd56943cf121c3caa2/mmdet/core/bbox/iou_calculators/iou2d_calculator.py#L212-L215
@fcjian Thanks for reply! It solves the CUDA error, but the model can not converge. During training, a problem similar with gradient cutting happened. The log shows a sudden increase of loss. After that, the loss fluctuates in a tiny range. I’ll try again with the original TOOD code without transfering to higher mmdet version.