Evaluation results vary for same saved weight.
See original GitHub issue❓ Questions and Help
I have a question when I evaluated my model.
I ran the command below several times and it returned different mAp results.
python -m torch.distributed.launch --nproc_per_node=1 tools/test_net.py --config-file "stand_file/e2e_faster_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH 16
I would like to know why this happened.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
The Model Performance Mismatch Problem (and what to do ...
The procedure when evaluating machine learning models is to fit and evaluate them on training data, then verify that the model has good...
Read more >Evaluating on training data gives different loss - Cross Validated
When using model.fit and model.evaluate on different datasets, the result will NEVER be exactly the same. There is a multitude of factors, ...
Read more >Training & evaluation with the built-in methods - Keras
This guide covers training, evaluation, and prediction (inference) models ... State update and results computation are kept separate (in ...
Read more >IoU a better detection evaluation metric - Towards Data Science
Choosing the best model architecture and pretrained weights for your task can be hard. If you've ever worked on an object detection problem...
Read more >Evaluation Metrics Machine Learning - Analytics Vidhya
Learn different model evaluation metrics for machine learning like cross validation, confusion matrix, AUC-ROC, RMSE, Gini coefficients and ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This probably happens because when you batch different images together, you have different paddings and that affect slightly the output of the model (i.e., the predictions).
In your case, you are using a batch size of 16, and by default we are shuffling the images during evaluation https://github.com/facebookresearch/maskrcnn-benchmark/blob/55796a04ea770029a80cf5933cc5c3f3f6fa59cf/maskrcnn_benchmark/data/build.py#L126 so that every run will have different batches of images, and thus different paddings and different results.
Try removing the shuffling or making the batch size to be 1 (which is the most robust solution anyway)
Thanks for your response. I found that I added new Transform method and forgot to not do transforms when testing . Now it works well.