question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training error. bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

See original GitHub issue

Hi, I meet some problems when training. The error message is as follows:

ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

And I find before the error, the loss has turned to nan, and I followed some suggestions like climp gradient or reduce lr, none of them worked.

[session 1][epoch  1][iter  300/2164] loss: nan, lr: 1.00e-04
			fg/bg=(128/0), time cost: 29.000118

I checked my annotation files, some xmin is 0, I don’t know if it is the problem, because I plus xmin to 1, it’s not work. And I print gt_boxes and I found xmin is more than 64041, apparently it’s not right.

gt_boxes is tensor([[[6.4041e+04, 1.7687e+02, 2.2182e+02, 4.3876e+02, 2.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],

So I think there is somewhere wrong about compute the gt_boxes in your code, but it hard to find out, could you give me a clue about how to fix it? Thank for your kindly reply!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:5

github_iconTop GitHub Comments

8reactions
marcunzuetacommented, Aug 4, 2019

Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.

Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g: openimage.py I recommend you copy the pascal_voc.py and work from there. Delete the -1.

Moreover, change in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122 delete the -1.

There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.

hope it helps! 😃

5reactions
z-huabaocommented, Aug 12, 2019

make sure x2 and y2 < width because it will flip image and annotation

        wh = tree.find('size')
        w, h = int(wh.find('width').text), int(wh.find('height').text)
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)
            x1 = max(x1, 0)
            y1 = max(y1, 0)
            x2 = min(x2, w)
            y2 = min(y2, h)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Does zero training error mean zero bias? - Cross Validated
Say your biased classifier always predicts zero, but your dataset happens to be all labeled zero. zero bias =/> zero training error. Zero...
Read more >
3. Training error vs Test error - YouTube
Your browser can 't play this video. ... Training error vs Test error ... The Elements of Statistical Learning: Data Mining, Inference, ...
Read more >
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010
SOLUTION: First w1 will become 0, then w2. The data can be classified with zero training error and therefore also with high log-...
Read more >
datasciencecoursera/AdviceQuiz.md at master - GitHub
The gap in errors between training and test suggests a high variance problem in which the algorithm has overfit the training set. Decreasing...
Read more >
Training & Test Error: Validating Models in Machine Learning
Possibly. But often it is not the model that's wrong, but how the model was validated. A wrong validation delivers over-optimistic expectations ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found