Evaluation: bad results with batch_size=4
See original GitHub issueHello,
I tried running the main.py
script in evaluation mode with batch_size=4
, using the provided pre-trained weights for the DETR-R50 model, and I obtained mAP=0.26
. Changing batch_size
to 1, I obtained the reported mAP=0.42
.
Because of the padding, I can understand how the batch size affects the results. However, looking at the logs that you provided, it seems that you obtained mAP=0.42
at the end of training (which, according to the paper, used batch_size=4
).
Could you please tell me which batch size you used for the validation step of your trainings?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
What to Do After a Bad Performance Review
It can be hard to recover from a less-than-stellar performance review, especially one that you didn't see coming. You might feel angry, ...
Read more >The Good Way to Respond to a Bad Performance Review
The good way to respond to a bad performance review involves being prepared and standing up for yourself. Learn more in this article....
Read more >How To Give a Negative Performance Review - PerformYard
In this article, we'll share tips and examples of how to give negative feedback to improve performance in your organization. Is your employee ......
Read more >How To Respond to a Bad Performance Review - The Balance
There are some steps you can take if you think your boss has given you an unfair or inaccurate employee evaluation. Keep reading...
Read more >The Effect of Negative Performance Evaluations on Future ...
A positive outcome from performance evaluations that virtually all employees look forward to is a salary increase or wage hike. Employers who connect...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @netw0rkf10w please find the command and evaluation results of the baseline Detr model with different batch sizes below.
batch size 1
batch size 2
batch size 4
batch size 8
batch size 16
batch size 32
batch size 48
I just wanted to clarify that @szagoruyko 's tests are for inference only, on a model that was trained on a batch size per card of 4. For such model, as shown by his results, the inference batch size (per card) doesn’t influence much the AP.
However, I want to stress that a model trained with bs=1 per card won’t do well if tested with bs > 1. The reason is that it will never have encountered padding at train time, and thus will be confused when encountering it at test time. The drop is estimated at ~10 mAP if you train with 1 img/card and test with 2 img/card. This applies to our DC5 models in particular.
In conclusion, seeing some padding during training is necessary, but then DETR is robust to variable amount of padding.
I think the question at hand was resolved, and as such I’m closing this. Feel free to reach out if you have further concerns.