Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss_bbox: 5.0

See original GitHub issue

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (master) or latest version (3.x).

💬 Describe the reimplementation questions

I replace _bbox_decode function in YOLOX like this:

    def _bbox_decode(self, priors, bbox_preds):
        tl_x = priors[..., 0] - bbox_preds[..., 0]
        tl_y = priors[..., 1] - bbox_preds[..., 1]
        br_x = priors[..., 0] + bbox_preds[..., 2]
        br_y = priors[..., 1] + bbox_preds[..., 3]
        return torch.stack([tl_x, tl_y, br_x, br_y], dim=-1)

My config:

_base_ = [
    '../configs/yolox/yolox_s_8x8_300e_coco.py'
]

# dataset settings
dataset_type = 'CocoDataset'
data_root = 'path'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    _delete_=True,
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_minitrain2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline)
)
evaluation = dict(interval=1, metric='bbox')

# optimizer
optimizer = dict(_delete_=True, type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(_delete_=True, grad_clip=None)
# learning policy
lr_config = dict(
    _delete_=True,
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)

My result:

2022-10-25 09:29:29,856 - mmdet - INFO - Epoch [1][50/1564]	lr: 1.978e-03, eta: 1:15:21, time: 0.242, data_time: 0.070, memory: 10625, loss_cls: 0.9370, loss_bbox: 5.0000, loss_obj: 14.5174, loss: 20.4544
2022-10-25 09:29:38,077 - mmdet - INFO - Epoch [1][100/1564]	lr: 3.976e-03, eta: 1:03:09, time: 0.164, data_time: 0.019, memory: 10625, loss_cls: 0.7869, loss_bbox: 5.0000, loss_obj: 7.7109, loss: 13.4978
2022-10-25 09:29:47,287 - mmdet - INFO - Epoch [1][150/1564]	lr: 5.974e-03, eta: 1:01:02, time: 0.184, data_time: 0.018, memory: 10625, loss_cls: 0.6463, loss_bbox: 5.0000, loss_obj: 6.0049, loss: 11.6513
2022-10-25 09:29:55,211 - mmdet - INFO - Epoch [1][200/1564]	lr: 7.972e-03, eta: 0:57:55, time: 0.158, data_time: 0.019, memory: 10625, loss_cls: 0.5865, loss_bbox: 5.0000, loss_obj: 5.3746, loss: 10.9611
2022-10-25 09:30:03,355 - mmdet - INFO - Epoch [1][250/1564]	lr: 9.970e-03, eta: 0:56:15, time: 0.163, data_time: 0.019, memory: 10625, loss_cls: 0.3879, loss_bbox: 5.0000, loss_obj: 5.0617, loss: 10.4496
2022-10-25 09:30:12,697 - mmdet - INFO - Epoch [1][300/1564]	lr: 1.197e-02, eta: 0:56:20, time: 0.187, data_time: 0.017, memory: 10692, loss_cls: 0.3199, loss_bbox: 5.0000, loss_obj: 5.3904, loss: 10.7103

Environment

sys.platform: linux Python: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel® oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel® 64 architecture applications
Intel® MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2 OpenCV: 4.6.0 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.25.1+1b4891c

Expected results

Normal YOLOX bbox pred, same config, different bbox_decode function:

2022-10-25 09:30:40,552 - mmdet - INFO - Epoch [1][50/1564]	lr: 1.978e-03, eta: 1:15:08, time: 0.241, data_time: 0.069, memory: 10627, loss_cls: 1.6528, loss_bbox: 4.7309, loss_obj: 12.7018, loss: 19.0855
2022-10-25 09:30:48,810 - mmdet - INFO - Epoch [1][100/1564]	lr: 3.976e-03, eta: 1:03:09, time: 0.165, data_time: 0.020, memory: 10627, loss_cls: 1.8888, loss_bbox: 4.4457, loss_obj: 5.9525, loss: 12.2871
2022-10-25 09:30:58,025 - mmdet - INFO - Epoch [1][150/1564]	lr: 5.974e-03, eta: 1:01:03, time: 0.184, data_time: 0.018, memory: 10627, loss_cls: 2.3753, loss_bbox: 4.0260, loss_obj: 5.8119, loss: 12.2131
2022-10-25 09:31:05,916 - mmdet - INFO - Epoch [1][200/1564]	lr: 7.972e-03, eta: 0:57:52, time: 0.158, data_time: 0.019, memory: 10627, loss_cls: 2.4342, loss_bbox: 3.9371, loss_obj: 5.5452, loss: 11.9165

Additional information

No response

Issue Analytics

State:
Created a year ago
Comments:11

Top GitHub Comments

1reaction

iumyx2612commented, Oct 26, 2022

If this can be helpful, I wrote a script to visualize the label assignment process of simOTA.
I ran this on a test dataset here: https://public.roboflow.com/object-detection/synthetic-fruit
This is YOLOX using new _bbox_decode function, which decode like FCOS: iter 1:
49_jpg rf 003d87e7cc130a27308b3788502b8cf00