[Reimplementation] YOLOX bbox pred using t,l,r,b format always produce loss_bbox: 5.0
See original GitHub issuePrerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version (master) or latest version (3.x).
💬 Describe the reimplementation questions
I replace _bbox_decode
function in YOLOX like this:
def _bbox_decode(self, priors, bbox_preds):
tl_x = priors[..., 0] - bbox_preds[..., 0]
tl_y = priors[..., 1] - bbox_preds[..., 1]
br_x = priors[..., 0] + bbox_preds[..., 2]
br_y = priors[..., 1] + bbox_preds[..., 3]
return torch.stack([tl_x, tl_y, br_x, br_y], dim=-1)
My config:
_base_ = [
'../configs/yolox/yolox_s_8x8_300e_coco.py'
]
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'path'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
_delete_=True,
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_minitrain2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline)
)
evaluation = dict(interval=1, metric='bbox')
# optimizer
optimizer = dict(_delete_=True, type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(_delete_=True, grad_clip=None)
# learning policy
lr_config = dict(
_delete_=True,
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
My result:
2022-10-25 09:29:29,856 - mmdet - INFO - Epoch [1][50/1564] lr: 1.978e-03, eta: 1:15:21, time: 0.242, data_time: 0.070, memory: 10625, loss_cls: 0.9370, loss_bbox: 5.0000, loss_obj: 14.5174, loss: 20.4544
2022-10-25 09:29:38,077 - mmdet - INFO - Epoch [1][100/1564] lr: 3.976e-03, eta: 1:03:09, time: 0.164, data_time: 0.019, memory: 10625, loss_cls: 0.7869, loss_bbox: 5.0000, loss_obj: 7.7109, loss: 13.4978
2022-10-25 09:29:47,287 - mmdet - INFO - Epoch [1][150/1564] lr: 5.974e-03, eta: 1:01:02, time: 0.184, data_time: 0.018, memory: 10625, loss_cls: 0.6463, loss_bbox: 5.0000, loss_obj: 6.0049, loss: 11.6513
2022-10-25 09:29:55,211 - mmdet - INFO - Epoch [1][200/1564] lr: 7.972e-03, eta: 0:57:55, time: 0.158, data_time: 0.019, memory: 10625, loss_cls: 0.5865, loss_bbox: 5.0000, loss_obj: 5.3746, loss: 10.9611
2022-10-25 09:30:03,355 - mmdet - INFO - Epoch [1][250/1564] lr: 9.970e-03, eta: 0:56:15, time: 0.163, data_time: 0.019, memory: 10625, loss_cls: 0.3879, loss_bbox: 5.0000, loss_obj: 5.0617, loss: 10.4496
2022-10-25 09:30:12,697 - mmdet - INFO - Epoch [1][300/1564] lr: 1.197e-02, eta: 0:56:20, time: 0.187, data_time: 0.017, memory: 10692, loss_cls: 0.3199, loss_bbox: 5.0000, loss_obj: 5.3904, loss: 10.7103
Environment
sys.platform: linux Python: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel® oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel® 64 architecture applications
- Intel® MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.2 OpenCV: 4.6.0 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.25.1+1b4891c
Expected results
Normal YOLOX bbox pred, same config, different bbox_decode
function:
2022-10-25 09:30:40,552 - mmdet - INFO - Epoch [1][50/1564] lr: 1.978e-03, eta: 1:15:08, time: 0.241, data_time: 0.069, memory: 10627, loss_cls: 1.6528, loss_bbox: 4.7309, loss_obj: 12.7018, loss: 19.0855
2022-10-25 09:30:48,810 - mmdet - INFO - Epoch [1][100/1564] lr: 3.976e-03, eta: 1:03:09, time: 0.165, data_time: 0.020, memory: 10627, loss_cls: 1.8888, loss_bbox: 4.4457, loss_obj: 5.9525, loss: 12.2871
2022-10-25 09:30:58,025 - mmdet - INFO - Epoch [1][150/1564] lr: 5.974e-03, eta: 1:01:03, time: 0.184, data_time: 0.018, memory: 10627, loss_cls: 2.3753, loss_bbox: 4.0260, loss_obj: 5.8119, loss: 12.2131
2022-10-25 09:31:05,916 - mmdet - INFO - Epoch [1][200/1564] lr: 7.972e-03, eta: 0:57:52, time: 0.158, data_time: 0.019, memory: 10627, loss_cls: 2.4342, loss_bbox: 3.9371, loss_obj: 5.5452, loss: 11.9165
Additional information
No response
Issue Analytics
- State:
- Created a year ago
- Comments:11
Top Results From Across the Web
How to correctly format BBox predictions to feed into the NMS ...
I'm converting a YOLOX model, and have been able to match CoreML <-> PyTorch outputs match until just before the NMS layer. System...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If this can be helpful, I wrote a script to visualize the label assignment process of simOTA.
I ran this on a test dataset here: https://public.roboflow.com/object-detection/synthetic-fruit
This is YOLOX using new
_bbox_decode
function, which decode like FCOS: iter 1:iter 11:
iter 31:
The bigger the dot, the higher the feature pyramid scale it belongs to (bigger stride)
And this is normal YOLOX with normal
_bbox_decode
:iter 1:
iter 11:
iter 31:
@hhaAndroid please notify me when you starting working on this 🙇♂️