Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Reimplementation] IndexError :argmax() in refine_bboxes function of sparse-rcnn (latest version (3.x).)

See original GitHub issue

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (master) or latest version (3.x).

💬 Describe the reimplementation questions

When I try to train the sparse-rcnn series in the latest version (3. x) mmdet with my personal dataset (only 1 class), the following errors always occur:

  File "I:\mmdetection-3.x\mmdetection\mmdet\models\detectors\base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "I:\mmdetection-3.x\mmdetection\mmdet\models\detectors\two_stage.py", line 187, in loss
    roi_losses = self.roi_head.loss(x, rpn_results_list,
  File "I:\mmdetection-3.x\mmdetection\mmdet\models\roi_heads\sparse_roi_head.py", line 344, in loss
    bbox_results = self.bbox_loss(
  File "I:\mmdetection-3.x\mmdetection\mmdet\models\roi_heads\sparse_roi_head.py", line 122, in bbox_loss
    bbox_results = self._bbox_forward(stage, x, rois, object_feats,
  File "I:\mmdetection-3.x\mmdetection\mmdet\models\roi_heads\sparse_roi_head.py", line 230, in _bbox_forward
    results_list = bbox_head.refine_bboxes(
  File "I:\mmdetection-3.x\mmdetection\mmdet\models\roi_heads\bbox_heads\bbox_head.py", line 645, in refine_bboxes
    cls_scores[:, :-1].argmax(1), labels)
IndexError: argmax(): Expected reduction dim 1 to have non-zero size.

Process finished with exit code 1

And the class_scores’s shape is (100,1).100 is num_proposals. I guess this 1 should be 2. Because there are two categories: foreground and background.

In the config file, I only modified the num_classes option in the head part and dataset address.

Environment

sys.platform: win32
Python: 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
NVCC: Cuda compilation tools, release 11.4, V11.4.152
GCC: n/a
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.12.0
OpenCV: 4.1.2
MMEngine: 0.3.1
MMDetection: 3.0.0rc3+5b0d5b4

Expected results

No response

Additional information

1.In config:

changed the data and annotation path
added metainfo to add my ‘CLASSES’
changed ‘num_classes’ from 80 to 1.

2.I also test mmdet3.0 with my dataset in cascade rcnn.And the second dim of cls_scores is 2. So I suppose is there some problems in DIIHead? Or just I use the wrong way to train sparse rcnn?