Training error. bg_num_rois = 0 and fg_num_rois = 0, this should not happen!
See original GitHub issueHi, I meet some problems when training. The error message is as follows:
ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!
And I find before the error, the loss has turned to nan
, and I followed some suggestions like climp gradient
or reduce lr
, none of them worked.
[session 1][epoch 1][iter 300/2164] loss: nan, lr: 1.00e-04
fg/bg=(128/0), time cost: 29.000118
I checked my annotation files, some xmin
is 0
, I don’t know if it is the problem, because I plus xmin
to 1
, it’s not work.
And I print gt_boxes
and I found xmin
is more than 64041
, apparently it’s not right.
gt_boxes is tensor([[[6.4041e+04, 1.7687e+02, 2.2182e+02, 4.3876e+02, 2.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],
So I think there is somewhere wrong about compute the gt_boxes
in your code, but it hard to find out, could you give me a clue about how to fix it?
Thank for your kindly reply!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:5
Top Results From Across the Web
Does zero training error mean zero bias? - Cross Validated
Say your biased classifier always predicts zero, but your dataset happens to be all labeled zero. zero bias =/> zero training error. Zero...
Read more >3. Training error vs Test error - YouTube
Your browser can 't play this video. ... Training error vs Test error ... The Elements of Statistical Learning: Data Mining, Inference, ...
Read more >10-701/15-781 Machine Learning - Midterm Exam, Fall 2010
SOLUTION: First w1 will become 0, then w2. The data can be classified with zero training error and therefore also with high log-...
Read more >datasciencecoursera/AdviceQuiz.md at master - GitHub
The gap in errors between training and test suggests a high variance problem in which the algorithm has overfit the training set. Decreasing...
Read more >Training & Test Error: Validating Models in Machine Learning
Possibly. But often it is not the model that's wrong, but how the model was validated. A wrong validation delivers over-optimistic expectations ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.
Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g:
openimage.py
I recommend you copy thepascal_voc.py
and work from there. Delete the -1.Moreover, change in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122 delete the -1.
There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.
hope it helps! 😃
make sure x2 and y2 < width because it will flip image and annotation