Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error if the gradient of tensor is None.

See original GitHub issue

The gradient of the tensor may be None, if the tensor is forward, but do not backward.

For example, I’m using BERT to finetune a model with the last second enocded_layer. The last layer is calculated when forward, however, it’s not gradient do not be calculated when backward.

The following is the error message.

File "/usr/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 147, in step
    grads_groups_flat.append(_flatten_dense_tensors([p.grad for p in group]))
File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 194, in _flatten_dense_tensors
    flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 194, in <listcomp>
    flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
AttributeError: 'NoneType' object has no attribute 'contiguous'

Issue Analytics

State:
Created 5 years ago
Reactions:3
Comments:13 (3 by maintainers)

Top GitHub Comments

14reactions

schoennenbeckcommented, Jan 30, 2019

Here is an actual fix:

In apex/optimizers/fp16_optimizer.py line 147 currently reads

grads_groups_flat.append(_flatten_dense_tensors([p.grad for p in group]))

If you replace that with

grads_groups_flat.append(_flatten_dense_tensors([p.grad if p.grad is not None else p.new_zeros(p.size()) for p in group]))

you get rid of the bug. I would be happy to submit a pull request for this fix. However, I am not sure if this is really the optimal solution as we need to allocate a new all zeros tensor for each parameter that has ‘None’-gradient even though this clearly adds nothing to the computation. I haven’t found a way yet to get around that as all gradients and parameters get flattened so it is essential that we do have these zeros there.

4reactions

mcarillicommented, Jun 10, 2019

@adihaviv Yes, there is a planned fix. I’m reworking FusedAdam so that it won’t require param flattening anymore (WIP branch is https://github.com/NVIDIA/apex/tree/multi_tensor_sgd) and None gradients should be acceptable in a single process.

Top Results From Across the Web

Gradient is none in pytorch when it shouldn't - Stack Overflow

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that...

Why are my tensor's gradients unexpectedly None or not None?

If a tensor is created from an operation that's “differentiable” by Autograd - including operations like .to() which don't look differentiable - ...

Automatic differentiation package - torch.autograd

Computes the sum of gradients of given tensors w.r.t. graph leaves. The graph is differentiated using the chain rule. If any of tensors...

GradientTape Returns None Values (No Gradients Provided ...

Convert the mu tensor into a list · Iterate through the elements in the mu list (mu_l) and if a value had an...

Better performance with tf.function | TensorFlow Core

raise Exception('Expected {} to be raised but no error was raised! ... with TensorShape([None, None]) but if a ConcreteFunction with TensorShape([1, ...