Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

COCO AP of FPN with ResNet-50 backbone for object detection

See original GitHub issue

Hi @fmassa, thanks for the great codes. I am confused about COCO AP of Faster R-CNN ResNet-50 FPN, from Document and #925 and Source Code, I guess that the model Faster R-CNN ResNet-50 FPN was trained with following hyperparameters and got AP 37.0, am I right?

Repo	Network	box AP	scheduler	epochs	lr-steps	batch size	lr
vision	R-50 FPN	37.0	2x	26	16, 22	16	0.02

batch_size = 2 * 8 (NUM_GPU) = 16

However, I noticed that the box AP in maskrcnn-benchmark and Detectron seems to have better performance as below:

Repo	Network	box AP	scheduler	epochs	lr-steps	batch size	lr
maskrcnn-benchmark	R-50 FPN	36.8	1x	12.28	8.19, 10.92	16	0.02
Detectron	R-50 FPN	36.7	1x	12.28	8.19, 10.92	16	0.02
Detectron	R-50 FPN	37.9	2x	24.56	16.37, 21.83	16	0.02

from maskrcnn-benchmark 1x config epochs = 90000 (steps) * 16 (batch size) / 117266 (training images per epoch) = 12.28 btw, COCO2017 has 118287 training images but only 117266 training images contain at least one object

I would like to know what causes this gap?

37.0 (torchvision 2x) vs 36.8 (maskrcnn-benchmark 1x)
37.0 (torchvision 2x) vs 37.9 (Detectron 2x)

Besides, could I have the result which trained with scheduler 1x?

Repo	Network	box AP	scheduler	epochs	lr-steps	batch size	lr
vision	R-50 FPN	??	1x	13	8, 11	16	0.02

Thank you!

Issue Analytics

State:
Created 3 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

potterhsucommented, Apr 17, 2020

Sure, I’ve sent PR #2113 for this.

1reaction

fmassacommented, Apr 9, 2020

Hi,

There are a few differences between both implementations that lead to this difference in mAP:

maskrcnn-benchmark and detectron2 uses ResNet50 Caffe2 pre-trained weights with stride in the 1x1 convolution, while torchvision models uses the torchvision resnet50 models (with stride in the 3x3 convolution)
we use l1 loss for both regression, instead of smooth_l1 – this is for simplicity, but gives a different trade-off on mAP and mAP@50
we use slightly different weight initializations for some layers – this is for simplicity

Those all cumulate to lead to this discrepancy that you see. Given the complexity of Faster R-CNN as a model, every tiny detail can change a bit the dynamics of the training, while producing in the end (after more epochs) comparable models, so for the sake of uniformity and simplicity, we decided to make this compromise.

IIRC, training on the 1x schedule gives ~36.3 mAP, but I can’t find the logs anymore and would need to re-train the model to be sure.

Let me know if you have more questions!

Top Results From Across the Web

Simple Training Strategies and Model Scaling for Object ...

Table 1: Ablation study of the modern techniques discussed in this paper. Results are reported using a RetinaNet detector with a ResNet-50 backbone...

arXiv:2107.00057v1 [cs.CV] 30 Jun 2021

as the backbone for object detection and instance segmen- tation systems. ... a ResNet152-FPN backbone achieves 52.9% AP on COCO.

Trident Pyramid Networks: The importance of processing at ...

Keywords: feature pyramid, network architecture, object detection, deep learning ... a ResNet-101+FPN baseline with our ResNet-50+TPN network by 1.7 AP, ...

Understanding Feature Pyramid Networks for object detection ...

We use the ROIs and the feature map layer to create feature patches to be fed into the ROI pooling. In FPN, we...

DetNAS: Backbone Search for Object Detection - NIPS papers

ImageNet Classification. Object Detection with FPN on COCO. Backbone. FLOPs. Accuracy. mAP AP50. AP75. APs. APm. APl. ResNet-50. 3.8G.