Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems when add dsntnn into HourglassNet.

See original GitHub issue

Hi @anibali ! Thanks to the your concise code, it’s real very convenient to add your dsntnn module into a Hourglass Network. But when I try to do this and train the Hourglass Network with MPII dataset, it seems not converge well.

The way I add dsntnn module into Hourglass Network ：

class HourglassDsntNet(nn.Module):
  def __init__(self, nStack, nModules, nFeats, nRegModules):
    super(HourglassDsntNet, self).__init__()
    self.nStack = nStack
    self.nModules = nModules
    self.nFeats = nFeats
    self.nRegModules = nRegModules
    self.conv1_ = nn.Conv2d(3, 64, bias=True, kernel_size=7, stride=2, padding=3)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.r1 = Residual(64, 128)
    self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.r4 = Residual(128, 128)
    self.r5 = Residual(128, self.nFeats)

    _hourglass, _Residual, _lin_, _tmpOut, _ll_, _tmpOut_, _reg_ = [], [], [], [], [], [], []
    for i in range(self.nStack):
      _hourglass.append(Hourglass(4, self.nModules, self.nFeats))
      for j in range(self.nModules):
        _Residual.append(Residual(self.nFeats, self.nFeats))
      lin = nn.Sequential(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1),
                          nn.BatchNorm2d(self.nFeats), self.relu)
      _lin_.append(lin)
      _tmpOut.append(nn.Conv2d(self.nFeats, ref.nJoints, bias=True, kernel_size=1, stride=1))
      _ll_.append(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1))
      _tmpOut_.append(nn.Conv2d(ref.nJoints, self.nFeats, bias=True, kernel_size=1, stride=1))

    self.hourglass = nn.ModuleList(_hourglass)
    self.Residual = nn.ModuleList(_Residual)
    self.lin_ = nn.ModuleList(_lin_)
    self.tmpOut = nn.ModuleList(_tmpOut)
    self.ll_ = nn.ModuleList(_ll_)
    self.tmpOut_ = nn.ModuleList(_tmpOut_)

  def forward(self, x):
    x = self.conv1_(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.r1(x)
    x = self.maxpool(x)
    x = self.r4(x)
    x = self.r5(x)

    outMap = []
    outReg = []

    for i in range(self.nStack):
      hg = self.hourglass[i](x)
      ll = hg
      for j in range(self.nModules):
        ll = self.Residual[i * self.nModules + j](ll)
      ll = self.lin_[i](ll)
      tmpOutMap = self.tmpOut[i](ll)
      heatmaps = dsntnn.flat_softmax(tmpOutMap)
      outMap.append(tmpOutMap)
      tmpOutReg = dsntnn.dsnt(heatmaps)
      outReg.append(tmpOutReg)

      ll_ = self.ll_[i](ll)
      tmpOut_ = self.tmpOut_[i](tmpOutMap)
      x = x + ll_ + tmpOut_

    return outMap, outReg

the way I do the train procedure ：

    for i, (input, target2D, target3D, meta) in enumerate(dataLoader):
        input_var = torch.autograd.Variable(input).float().cuda()
        target2D_var = torch.autograd.Variable(target2D).float().cuda()
        target3D_var = torch.autograd.Variable(target3D).float().cuda()

        out_map, out_reg = model(input_var)
        # filter the joint without annotation
        filter = target3D_var[:, :, 2].unsqueeze(dim=2)
        out_reg[0] = out_reg[0] * filter
        out_reg[1] = out_reg[1] * filter
        
        loss_map = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        loss_reg = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        loss = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        for k in range(opt.nStack):
            # Per-location euclidean losses
            euc_losses = dsntnn.euclidean_losses(out_reg[k], target3D_var[:, :, :2])
            # Per-location regularization losses
            reg_losses = dsntnn.js_reg_losses(out_map[k], target3D_var[:, :, :2], sigma_t=1.0)
            # Combine losses into an overall loss
            loss += dsntnn.average_loss(euc_losses + reg_losses)
            loss_map += euc_losses
            loss_reg += reg_losses

        if split == 'train':
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

I only try to train the network for five epochs, and the results shows that it seems not going to converge at all. All the other experiment settings work fine with the pure Hourglass Network. So is there any tricks I should add in my code? or I just add your module in an incorrect way?

It seems that you did some experiments with Hourglass in your paper, could you offer any help?

Issue Analytics

State:
Created 5 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

sunshineatnooncommented, Oct 1, 2018

Hi @anibali Thanks for the response, I found the bug, it’s indeed on my side. Thanks for this awesome work.

1reaction

Hellomodocommented, May 5, 2018

Thanks to your quick replay! The mistake you point out is indeed the cause that ‘LM’, that means loss for js regularisation, increases during the training process

tmpOutMap = self.tmpOut[i](ll) heatmaps = dsntnn.flat_softmax(tmpOutMap) outMap.append(tmpOutMap) # <-- Should beoutMap.append(heatmaps)? But the problem still unresolved is that ‘LR’, that means Euclidean loss for the numerical coordinates, keeps coverging very slowly. And of course, the performance is still bad.

_20180505122127