Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions on reproducing the reported results on MS COCO

See original GitHub issue

Hi,

First, thank you for sharing the exciting work.

I was trying to reproduce the results on MS COCO dataset based on my own training framework. When I used cross entropy loss loss_function=AsymmetricLoss(gamma_neg=0,** gamma_pos=0, clip=0) to achieve the baseline. The result (with backbone of ResNet101) of mAP ~82.5% was achieved, which is quite similar to the result reported in Fig. 8 of the paper.

Then, I replaced the loss function with loss_function=AsymmetricLoss(gamma_neg=4, gamma_pos=1, clip=0.05) – all other hyper parameters were kept consistent. However, I only got the mAP result of ~82.1%.

Also, the traditional focal loss loss_function=AsymmetricLoss(gamma_neg=2, gamma_pos=2, clip=0) can not outperform the baseline (~82.5%), given the same configurations. I am curious about the issue of my training process.

Could you also please share some training tricks? For example, a snippet of code on adjusting learning rate, training transforms similar to that used for validation here, etc. Or, is there any suggestions?

Thank you.

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:12 (4 by maintainers)

Top GitHub Comments

9reactions

mrT23commented, Jan 28, 2021

We honestly haven’t encountered any case where ASL has not outperformed easily cross entropy.

here are some training tricks we used (they are quite standard and can be found also in public repositories like this), see if something resonant differently from your framework:

for learning rate, we use one cycle policy (warmup + cosine decay) with Adam optimizer and max learning rate of ~2e-4 to 4e-4
very important to use also EMA
true weight decay of 1e-4 (“true” == no wd for batch norm and bias)
we have our own augmentation package, but important to use at least standard AutoAugment.
cutout of 0.5 (very important)
squish resizing, not crop (important)
try replacing resnet with TResNet. it will give you the same GPU speed, with higher accuracy

that’s what I can think of at the top of my head.

5reactions

mrT23commented, Jan 20, 2021

I agree.

We cannot share our training code as-is due to commercial limitations, but once a public code will be shared, we can try to help improve it and get results similar to the ones in the article