How should I configure the target classes when building my custom dataset (esp. not-present class)?
See original GitHub issueHi,
I’m trying to build my own dataset that would work with DETR. I’m a bit confused on the data format that the matcher expects to perform the matching.
From the comment https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/matcher.py#L44, it seems like we are not going to include 0 / null class padding for when an object is not present. In other words, if I have a batch with 2 images, one has 3 objects and one has 4 objects, and num_queries
is 100, then targets[0]['labels']
has length 3 and targets[0]['labels']
has length 4. Is that correct? I should not include any null / padding class to the matcher that indicates when something is not present?
If so, my confusion is how is DETR able to avoid predicting duplicate boxes? Say DETR predicts 50 present boxes, and the ground truth target I supply has 3 boxes. Then the matcher would pick the top 3 boxes from the predicted 50 and only penalize those 3, while the other 47 boxes are not penalized. They could as well be some nonsense, but present, boxes that confuses the result.
I must be missing something here. Please help : )
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Hi @LeoZDong
Your understanding of the matcher is correct. Note however that the matcher is not computing any loss, this is the job of the SetCriterion. Specifically, here is what happens for the label supervision: https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/detr.py#L116-L119
By default, the unmatched queries will be supervised to predict the “no-object” class. The corresponding index for this class is
self.num_classes
in the snippet above. For the matched queries, this default label is replaced with the label of the matched query.The matching cost is a combination of a localization cost and a classification cost. There is no general answer to your question, it all depends on the specific situation. If a box has very good localization but very poor classification, depending on the exact values as well as the coefficient for both costs, it’s possible that the GT will be matched to a different box with slightly worse localization but better classification.
I hope this helps, feel free to reach out if you have further questions.
Thank you so much for your feedback.
I mean, do I need to include images that don’t have any airplanes in my dataset. and if so how? because when you annotate an image if you don’t create any bounding boxes (because there is nothing in this image), no entry would be created in the json annotations. So, I have trained a model already and it performs very well for inference of images that do have objects, but when I do inferences on images with no objects, it perform very badly.