Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dual focal loss

See original GitHub issue

What is the equivalent of Adaptive class weight layer in the original article? The output of this layer should be the input of the softmax here: pred = torch.softmax(logits, dim=1) but none treatment were made for this input.

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

CoinCheungcommented, Apr 21, 2020

I have just thought of that, if I have not misunderstood what ACW does, it seems that when the logits output of the model is a [N, M] tensor(N samples and each sample has M output values as logits of each class), ACW provides a weight vector of shape [1, M] that should be multiplied into the logits of each sample as the learned remedy for the knowledge learned from the unbalanced dataset. For each row of the [N, M] logits tensor, the weight vector is multiplied to it elementwisely, which I thought should be similar behavior of depthwise convolutions. For example, your model structure might be like this:

class Model(nn.Module):

    def __init__(self):
        self.backbone = some_fcn()
        self.acw = nn.Conv2d(in_chan, out_chan, 1, 1, 0, groups=in_chan)
        self.criteria = dual_focal_loss()

    def forward(self, x):
        feat = self.backbone(x)
        avg_pool = torch.mean(feat, dim=(2,3), keepdim=True)
        logits = self.acw(avg_pool)
        if self.training:
            out = self.criteria(logits)
        else:
            out = avg_pool
        return out

This nn.Conv2d with groups as same as input channels should act similarly as the acw layer.

1reaction

CoinCheungcommented, Apr 19, 2020

Hi, I have no idea if you are still interested in implementing acw layer, but I have just thought that maybe you can use a 1x1 depth-wise convolution layer following the logits, which should work in the same way as the acw layer.