Support gradient clipping
See original GitHub issueWhen using differentiable optimizer we call diffopt.step(loss)
which updates the parameters of a functional model and returns them, therefore it seems not possible to do gradient clipping. Would it be a good idea to add a function-type argument to the step method of differentiable optimizers which takes as input either all_grads
or grouped_grads
and should returned modified gradients or is there a better way of doing this? I could work on a PR implementing this given some guidance.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Understanding Gradient Clipping (and How It Can Fix ...
Gradient Clipping is a method where the error derivative is changed or clipped to a threshold during backward propagation through the network, and...
Read more >Introduction to Gradient Clipping Techniques with Tensorflow
Gradient clipping involves capping the error derivatives before propagating them back through the network. The capped gradients are used to update the weights ......
Read more >How to Avoid Exploding Gradients With Gradient Clipping
Gradient clipping involves forcing the gradient values (element-wise) to a specific minimum or maximum value if the gradient exceeded an ...
Read more >What is Gradient Clipping? - Towards Data Science
Gradient clipping ensures the gradient vector g has norm at most c. This helps gradient descent to have a reasonable behaviour even if...
Read more >Gradient Clipping - Medium
Gradient clipping will 'clip' the gradients or cap them to a Threshold value to prevent the gradients from getting too large. In the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@MichaelKonobeev could you please take a look at #21 and see if it fits your needs? Feedback/improvement suggestions welcome.
Please note that if you literally clip gradients, it will not be possible to take higher order gradients (backprop through backprop) as they will not exist. A differentiable continuous relaxation of the clipping operation will be required.
@MichaelKonobeev Your utility functions would be most welcome contributions in
higher.utils
, if you would like to submit a separate PR. It would be helpful if minimal tests for these functions were added totests/utils.py
.For now, I am closing this issue as I am assuming it is addressed by #21. Please re-open if it does not address the issue, or flag new problems in a separate issue.