Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

custom training asserts with "degenerate bboxes" over and over - but bboxes look correct, any debugging insight?

See original GitHub issue

I’m trying to get my custom dataset working but I can’t get past 8 or so images via get_item and it keeps asserting that my bboxes are bad…I pull that one, it flags the next one, I pull that one, it flags the next…

From reading the code it wants to check that x1 and y1 are larger than x0 and y0 which is a great check.

55   assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

But it keeps flagging images that when I unwind from coco format should be fine…thus any insights? I was not able to print the boxes1 (200,4) and boxes 2 (12,4) tensors for some reason so I couldn’t see into what it was actually calculating for the results (threw an odd gpu issue with ‘formatting’).

Example it flagged this image as being bad - here’s the JSON for it in coco format, 6 classes. 1 box will surround all the other 5 objects btw as it’s a malaria reader, so not sure if that box encompassing other boxes is really the issue?):

{"id": "c33c3539-8bd1-48e0-8065-831709e5e64d", "image_id": 3091210, "category_id": 2905442, "segmentation": null, "area": 0, "bbox": **[499, 121, 177, 80]**, "iscrowd": 0}, 
{"id": "0023d71e-e1e9-4862-a0b8-6e2bc3982b3b", "image_id": 3091210, "category_id": 2905422, "segmentation": null, "area": 0, "bbox": **[492, 523, 187, 163]**, "iscrowd": 0},
 {"id": "726fdfbc-3801-409d-ab75-ccf951e74316", "image_id": 3091210, "category_id": 2905421, "segmentation": null, "area": 0, "bbox": **[496, 428, 181, 93],** "iscrowd": 0}, 
{"id": "2bf85a8e-108d-4875-b0f5-47c8e5cb13e0", "image_id": 3091210, "category_id": 2905420, "segmentation": null, "area": 0, "bbox": **[494, 272, 186, 169]**, "iscrowd": 0},
 {"id": "8669c13a-1205-4e94-a645-18e2ffa491d0", "image_id": 3091210, "category_id": 2905419, "segmentation": null, "area": 0, "bbox": **[489, 127, 193, 557]**, "iscrowd": 0},
 {"id": "d9619859-e0ef-4632-ad51-7237a5760a5e", "image_id": 3091210, "category_id": 2905418, "segmentation": null, "area": 0, "bbox": **[495, 203, 182, 73]**, "iscrowd": 0},

And as a check for me, here’s coco format: The COCO bounding box format is [top left x position, top left y position, width, height].

All the bboxes which it flags, are positive numbers for width and height, so the x1 and y1 must be larger than x0 and y0 - only a negative number added to the original x0 or y0 could result in it being smaller…so I’m unclear what it is asserting on or for.

But it asserts here:

~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
     53     #print(boxes1)
     54     #print(boxes2)
---> 55     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
     56     assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
     57     iou, union = box_iou(boxes1, boxes2)

I’ve removed 15+ images trying to get it to actually train but just keeps flagging more and more as invalid bboxes. I remove one image, then it asserts on the next one…and in reviewing the ones it flags vs the ones it lets pass, I don’t see any real difference. (I have trained with this same dataset on EffficientDet so I know the dataset is reasonable).

Thus any help into debugging, or what might be awry would be appreciated. Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:21 (20 by maintainers)

Top GitHub Comments

7reactions

alcinoscommented, Jun 2, 2020

Hi @lessw2020 apologies for the confusion, the class IDs need to be remapped to [0, 6]. Basically you want tgt_ids.max() < num_classes

EDIT: to clarify, in our case for COCO there is 80 classes with labels in [0, 90], and for simplicity we don’t do remapping so we use num_classes=91 (so that we satisfy the inequality above). It doesn’t matter that some ids will never be used (it’s a slight waste of parameters, but negligible in this case). In your case it won’t work though, you really don’t want to have a softmax over 2.9M elements, so remapping is the way to go.

3reactions

fmassacommented, Jun 1, 2020

@lessw2020 from debug2.txt, the error comes from

cost_class = -out_prob[:, tgt_ids]

which indicates that your your class probability has fewer elements than the ground-truth indices.

If you add a print(tgt_ids.max()) in your code, you’ll see that it is larger than 6, which means that there might be an issue with your dataset (as you have more classes than you thought). I believe this is probably the issue that you are facing.

As an unrelated note, I noted that you are passing --no_aux_loss to the model – note that our best results are obtained with aux_loss. The evaluation code doesn’t need aux loss because it’s just evaluation and it is slightly faster, but for training in general it’s better to use aux_loss.

Top Results From Across the Web

Use Debugger with Custom Training Containers

The AWS CLI, SageMaker Estimator API, and the Debugger APIs enable you to use any Docker base images to build and customize containers...

Alembic File Importer - Unreal Engine Documentation

Unreal Engine 4 (UE4) enables you to import your Alembic files through the Alembic Importer, which gives you the freedom to create complex...

Finding optimal rotation and translation between ... - Nghia Ho

I manually selected a few corresponding points between two models and estimated rigid transformation based this SVD based method. But the result seems...

How To Train YOLOv5-OBB (Oriented Bounding Boxes ...

Learn how to train your Roboflow Computer Vision dataset with YOLOv5-OBB.✓ Subscribe: https://bit.ly/rf-yt-subOriented bounding boxes are ...

ED341985.pdf - ERIC - Department of Education

154p. National Council of Teachers of English, 1111 Kenyon. Rd., Urbana, IL 61801 (Stock No. 03847-3050: ...