Understanding of num_classes in code
See original GitHub issueDear Authors, thanks for sharing high performance models.
I am reading through the DINO model code and get some questions below. Could you please help me?
- Is the num_classes = actual_num_classes + 1(background class) in detrex? https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/projects/dino/modeling/dino.py#L101
The reason I am asking is because background class should also need a label, as I can see in DINO repo.
https://github.com/IDEA-Research/DINO/blob/66d7173cc4167934381a898b07c08507bdd96b63/models/dino/dino.py#L81
self.label_enc = nn.Embedding(dn_labelbook_size + 1, hidden_dim)
- src_logits.shape[batch_size, n_dn_queries=900, num_classes] and target_classes.shape is [batch_size, n_dn_queries=900]. Does num_classes include background class, which means the last logit value is for background class? Is label_id for classes starting from “1” (I can see category_id is starting from 1 in coco dataset)? https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L112
If num_classes already includes background class, then +1 in this line is not needed(but cross-entropy loss is not in use, so it does not matter.)? https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L103
I was trying to apply DINO model in my custom dataset. So far it can train, but the performance is not so good. I think I might misunderstand num_classes
.
=======UPDATE======
I went through the code second time. It looks like for Focal Loss, num_classes only needs to = actual_num_classes(without +1 for background class). For example, there is a dataset with 2 classes: 0, 1. The logits for each prediction only needs 2 numbers, e.g., [0.0145, 0.0111]. If it is mapped to background class ‘2’, onehot encoding will be [0, 0, 1]. We can cut off the last digit of onehot so that [0.0145, 0.0111] is comparing with [0, 0].
So with Focal Loss, we only need to set num_classes=actual_num_classes(without +1 for background class) for all, including the following two locations.
Background class concept is only limited inside SetCriterion class when trying to produce one-hot encoding and the last digit will be cut off, becoming all zeros for background class.
Is my understanding correct?
Thanks, Cheng
Issue Analytics
- State:
- Created 9 months ago
- Comments:16
@HaoZhang534 I realized I can’t directly load a pretrained dino model, as there is an incompatibility in num_classes, num_queries and num_dn_queries. I can load other weights like transformer weights, but something like class_embed, label_enc need to be retrained.
I will have a try to see if loading a pretrained dino model is helpful. Thanks.
@weicheng113 You are welcome. Your concern is reasonable. It’s really a problem when objects are crowded. Maybe some improvements can be made to fix this such as only use negative examples when objects are not crowded.