Error if the gradient of tensor is None.
See original GitHub issueThe gradient of the tensor may be None, if the tensor is forward, but do not backward.
For example, I’m using BERT to finetune a model with the last second enocded_layer. The last layer is calculated when forward, however, it’s not gradient do not be calculated when backward.
The following is the error message.
File "/usr/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 147, in step
grads_groups_flat.append(_flatten_dense_tensors([p.grad for p in group]))
File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 194, in _flatten_dense_tensors
flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
File "/usr/lib/python3.7/site-packages/torch/_utils.py", line 194, in <listcomp>
flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
AttributeError: 'NoneType' object has no attribute 'contiguous'
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:13 (3 by maintainers)
Top Results From Across the Web
Gradient is none in pytorch when it shouldn't - Stack Overflow
For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that...
Read more >Why are my tensor's gradients unexpectedly None or not None?
If a tensor is created from an operation that's “differentiable” by Autograd - including operations like .to() which don't look differentiable - ...
Read more >Automatic differentiation package - torch.autograd
Computes the sum of gradients of given tensors w.r.t. graph leaves. The graph is differentiated using the chain rule. If any of tensors...
Read more >GradientTape Returns None Values (No Gradients Provided ...
Convert the mu tensor into a list · Iterate through the elements in the mu list (mu_l) and if a value had an...
Read more >Better performance with tf.function | TensorFlow Core
raise Exception('Expected {} to be raised but no error was raised! ... with TensorShape([None, None]) but if a ConcreteFunction with TensorShape([1, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Here is an actual fix:
In apex/optimizers/fp16_optimizer.py line 147 currently reads
grads_groups_flat.append(_flatten_dense_tensors([p.grad for p in group]))
If you replace that with
you get rid of the bug. I would be happy to submit a pull request for this fix. However, I am not sure if this is really the optimal solution as we need to allocate a new all zeros tensor for each parameter that has ‘None’-gradient even though this clearly adds nothing to the computation. I haven’t found a way yet to get around that as all gradients and parameters get flattened so it is essential that we do have these zeros there.
@adihaviv Yes, there is a planned fix. I’m reworking FusedAdam so that it won’t require param flattening anymore (WIP branch is https://github.com/NVIDIA/apex/tree/multi_tensor_sgd) and None gradients should be acceptable in a single process.