Train fcos_r50_caffe_fpn_gn_1x_4gpu only get 33.6 AP
See original GitHub issueCan’t get the model zoo’s 36.9AP
Training option is :
python tools/train.py own/configs/fcos/own_fcos_r50_caffe_fpn_gn_1x_4gpu.py --gpus 2 --work_dir own/work/fcos/resnet50/coco17
Trained with 2GPU. The config is below, I modified the norm_cfg and lr.
# model settings
model = dict(
type='FCOS',
pretrained='open-mmlab://resnet50_caffe',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
style='caffe'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs=True,
extra_convs_on_inputs=False, # use P5
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='FCOSHead',
num_classes=81,
in_channels=256,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128]))
# training and testing settings
train_cfg = dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
smoothl1_beta=0.11,
gamma=2.0,
alpha=0.25,
allowed_border=-1,
pos_weight=-1,
debug=False)
test_cfg = dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_thr=0.5),
max_per_img=100)
# dataset settings
dataset_type = 'CocoDataset'
data_root = '/deep3/coco/'
img_norm_cfg = dict(
mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
data = dict(
imgs_per_gpu=4,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0.5,
with_mask=False,
with_crowd=False,
with_label=True),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_crowd=False,
with_label=True),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_crowd=False,
with_label=False,
test_mode=True))
# optimizer
optimizer = dict(
type='SGD',
lr=0.01/2, #原为0.01
momentum=0.9,
weight_decay=0.0001,
paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='constant',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=500,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
total_epochs = 12
device_ids = [0,1]
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/fcos_r50_caffe_fpn_gn_1x_4gpu'
load_from = None
resume_from = None
workflow = [('train', 1)]
and the loss trend is:
2019-06-04 17:55:11,177 - INFO - Epoch [1][500/14659] lr: 0.00167, eta: 1 day, 13:52:21, time: 0.777, data_time: 0.028, memory: 8132, loss_cls: 0.7807, loss_reg: 1.0330, loss_centerness: 0.6560, loss: 2.4698
2019-06-04 21:10:14,956 - INFO - Epoch [2][500/14659] lr: 0.00500, eta: 1 day, 11:14:13, time: 0.818, data_time: 0.031, memory: 8136, loss_cls: 0.4015, loss_reg: 0.4589, loss_centerness: 0.6103, loss: 1.4707
2019-06-05 00:30:16,880 - INFO - Epoch [3][500/14659] lr: 0.00500, eta: 1 day, 8:26:15, time: 0.803, data_time: 0.032, memory: 8141, loss_cls: 0.3482, loss_reg: 0.4044, loss_centerness: 0.6042, loss: 1.3569
2019-06-05 03:48:26,608 - INFO - Epoch [4][500/14659] lr: 0.00500, eta: 1 day, 5:13:03, time: 0.808, data_time: 0.032, memory: 8141, loss_cls: 0.3275, loss_reg: 0.3721, loss_centerness: 0.6024, loss: 1.3020
2019-06-05 07:03:33,090 - INFO - Epoch [5][500/14659] lr: 0.00500, eta: 1 day, 1:52:26, time: 0.812, data_time: 0.032, memory: 8141, loss_cls: 0.3063, loss_reg: 0.3597, loss_centerness: 0.6011, loss: 1.2671
2019-06-05 10:19:46,424 - INFO - Epoch [6][500/14659] lr: 0.00500, eta: 22:36:29, time: 0.804, data_time: 0.032, memory: 8173, loss_cls: 0.2965, loss_reg: 0.3442, loss_centerness: 0.5983, loss: 1.2390
2019-06-05 13:37:49,787 - INFO - Epoch [7][500/14659] lr: 0.00500, eta: 19:22:50, time: 0.827, data_time: 0.032, memory: 8173, loss_cls: 0.2826, loss_reg: 0.3351, loss_centerness: 0.5965, loss: 1.2142
2019-06-05 16:52:37,752 - INFO - Epoch [8][500/14659] lr: 0.00500, eta: 16:06:22, time: 0.785, data_time: 0.032, memory: 8173, loss_cls: 0.2783, loss_reg: 0.3305, loss_centerness: 0.5962, loss: 1.2050
2019-06-05 20:05:11,231 - INFO - Epoch [9][500/14659] lr: 0.00050, eta: 12:49:41, time: 0.807, data_time: 0.031, memory: 8174, loss_cls: 0.2473, loss_reg: 0.3042, loss_centerness: 0.5937, loss: 1.1452
2019-06-05 23:24:41,264 - INFO - Epoch [10][500/14659] lr: 0.00050, eta: 9:36:41, time: 0.809, data_time: 0.032, memory: 8174, loss_cls: 0.2214, loss_reg: 0.2812, loss_centerness: 0.5904, loss: 1.0930
2019-06-06 02:43:40,252 - INFO - Epoch [11][500/14659] lr: 0.00050, eta: 6:22:42, time: 0.819, data_time: 0.031, memory: 8174, loss_cls: 0.2136, loss_reg: 0.2748, loss_centerness: 0.5894, loss: 1.0778
2019-06-06 06:02:26,207 - INFO - Epoch [12][500/14659] lr: 0.00005, eta: 3:08:12, time: 0.822, data_time: 0.031, memory: 8174, loss_cls: 0.2079, loss_reg: 0.2638, loss_centerness: 0.5883, loss: 1.0600
In the end the loss is about 1.06.
Is there any insight of the problem?
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (1 by maintainers)
Top Results From Across the Web
Passengers endure 19-hour train trip from Detroit to Chicago
PONTIAC, Mich. (AP) — What was supposed to be a 5 1/2-hour rail trip from Detroit to Chicago turned into a 19-hour ordeal...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
when you use 1 gpu, I think the lr is 0.01/4 rather than 0.02/4. Do I misunderstand this parameter? thanks!
GN
is only applied on FCOS head. Could you have a try again without replacing bn with gn on other modules except FCOS head?