Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch.autograd.grad()

See original GitHub issue

I’m using a gradient penalty, something like the following:

y = model(x)
loss = some_loss_func(y)

gradients = torch.autograd.grad(
  outputs=y,
  inputs=x,
  grad_outputs=y.new_ones(y.size()),
  create_graph=True,
  retain_graph=True,
  only_inputs=True)[0]
gradients = gradients.view(gradients.size(0), -1)
penalty = (gradients.norm(2, dim=1) ** 2).mean()

loss += penalty
with amp.scale_loss(loss, self.optimizer) as scaled_loss:
  scaled_loss.backward()

This results in the error RuntimeError: expected type torch.cuda.FloatTensor but got torch.cuda.HalfTensor on the .backward() call, in either O1 or O2 mode (but not O0 or O3). When I remove the gradient penalty, the code runs fine in all modes. I’m running on a single GPU.

Is this expected? If so, is there a suggested alternative?

Issue Analytics

State:
Created 5 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

4reactions

mcarillicommented, Mar 22, 2019

I think I know what’s happening. This is a nice one… With both O1 and O2, batchnorm weights are kept in FP32, which is a requirement to enable cudnn batchnorm. In O1 batchnorm weights remain FP32 because all weights remain FP32. In O2 batchnorm weights remain FP32 because we explicitly special-case keeping batchnorm weights in FP32, while the rest of the model weights are cast to FP16. Cudnn batchnorm forward can handle FP16 inputs+FP32 weights without trouble, and cudnn batchnorm backward can handle FP16 incoming gradients+FP32 weights without trouble. However, when a backward pass with create_graph=True is underway, Pytorch falls back to a non-cudnn (native) implementation of batchnorm backward that is double-differentiable. This native backward implementation CANNOT handle a combination of FP16 incoming gradients + FP32 weights, which (I suspect) causes your error.

There are a couple of approaches that might help here. With O1, you can try registering batchnorm as blacklist function, which will ensure its inputs and outputs (and therefore its incoming gradients during backward) are cast to FP32:

amp.register_float_function(torch, 'batch_norm')
model, optimizer = amp.initialize(model. optimizer, opt_level="O1")

Alternatively, with O2, you can work around by supplying the override keep_batchnorm_fp32=False, but this is less safe numerically imo.

1reaction

sunshineInmooncommented, Mar 26, 2019

@mcarilli Thanks for your analysis! It’s OK for my work.

Top Results From Across the Web

torch.autograd.grad — PyTorch 1.13 documentation

Computes and returns the sum of gradients of outputs with respect to the inputs. grad_outputs should be a sequence of length matching output...

Autograd.grad() for Tensor in pytorch - Stack Overflow

I get errors like: “RunTimeerror: grad can be implicitly created only for scalar outputs” . What should be the inputs in torch.autograd.grad() ......

Get different gradients by torch.autograd.grad and ... - GitHub

import torch class Net(torch.nn.Module): def __init__(self, dim = [1,20,1]): super(Net, self).__init__() self._net = FCN(dim[0],dim[1] ...

Automatic differentiation package - torch.autograd

torch.autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to the ...

dlc-slides-4-2-autograd.pdf - fleuret.org

torch.autograd.grad(outputs, inputs) computes and returns the gradient ... The function Tensor.backward() accumulates gradients in the grad fields of.