question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

nan for focal loss when training ssd

See original GitHub issue

Hi, I want to use focal loss to train ssd, and I change the ssd code, but the loss is always nan. the file I change is below: in ssd_head.py: ` def loss_single(self, cls_score, bbox_pred, labels, label_weights, bbox_targets, bbox_weights, num_total_samples, cfg): # loss_cls_all = F.cross_entropy( # cls_score, labels, reduction=‘none’) * label_weights # pos_inds = (labels > 0).nonzero().view(-1) # neg_inds = (labels == 0).nonzero().view(-1) # print("just to see anchor’s imbalance: ") # print("positive anchors: ") # print(pos_inds.size(0)) # print(neg_inds.size(0)) # print(“negtive relative to positive:”) # print(neg_inds.size(0) / pos_inds.size(0)) # num_pos_samples = pos_inds.size(0) # num_neg_samples = cfg.neg_pos_ratio * num_pos_samples # if num_neg_samples > neg_inds.size(0): # num_neg_samples = neg_inds.size(0) # topk_loss_cls_neg, _ = loss_cls_all[neg_inds].topk(num_neg_samples) # loss_cls_pos = loss_cls_all[pos_inds].sum() # loss_cls_neg = topk_loss_cls_neg.sum() # loss_cls = (loss_cls_pos + loss_cls_neg) / num_total_samples

    labels = torch.tensor(labels, dtype=torch.long)
    labels = torch.nn.functional.one_hot(labels, num_classes=21).cuda()

    loss_cls = py_sigmoid_focal_loss(
        cls_score, labels, label_weights, avg_factor=num_total_samples)

    loss_bbox = smooth_l1_loss(
        bbox_pred,
        bbox_targets,
        bbox_weights,
        beta=cfg.smoothl1_beta,
        avg_factor=num_total_samples)
    return loss_cls, loss_bbox #loss_cls[None]`

I comment the entropy cross loss, and change to focal loss just like above. before enter focal loss, I convert label to one-hot format. The training output: Screenshot from 2019-08-23 16-46-29

I search on the internet, and I try : pred_sigmoid = pred_sigmoid.clamp(min=0.0001, max=1.0) in py_sigmoid_focal_loss , but it’s no use.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
leewi9commented, Mar 15, 2021

@yzl96 hello, I have the same problem, can you note here how to modify all the codes? thanks

1reaction
yzl96commented, Nov 14, 2019

Focalloss focus on class imbalance of detection problems. So, when I use focalloss in ssd, I will ban the max negative mine catalog in original ssd, and make sure most positive and negative objects are calculated in focalloss.

Thanks you, I have solve the problem, after I limit the gradient, the loss can drop, but the performance of focal loss is lower than 1:3 loss, I did not figure why.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SSD Object Detector training results in NaN loss and RMSE
I've create an SSD with mobilenetv2 with the example from "Create SSD Object Detection Network". But changed the class count to just 1....
Read more >
Cost function turning into nan after a certain number of iterations
Well, if you get NaN values in your cost function, it means that the input is outside of the function domain. E.g. the...
Read more >
Focal Loss for Dense Object Detection - YouTube
ICCV17 | 1902 | Focal Loss for Dense Object DetectionTsung-Yi Lin (Cornell), Priya Goyal (Facebook AI Research), Ross Girshick (Facebook), ...
Read more >
arXiv:2201.02593v2 [cs.CV] 30 Jun 2022
The con- ventional focal loss balances the training process with the same modulating factor for all categories, thus failing to handle the long- ......
Read more >
While the model is training, it appears as loss = nan after a ...
Could be due to high of a learning rate, so first and foremost decrease the learning rate. · Check the classifier DNNClassifier if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found