Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support gradient clipping

See original GitHub issue

When using differentiable optimizer we call diffopt.step(loss) which updates the parameters of a functional model and returns them, therefore it seems not possible to do gradient clipping. Would it be a good idea to add a function-type argument to the step method of differentiable optimizers which takes as input either all_grads or grouped_grads and should returned modified gradients or is there a better way of doing this? I could work on a PR implementing this given some guidance.

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

egrefencommented, Jan 14, 2020

@MichaelKonobeev could you please take a look at #21 and see if it fits your needs? Feedback/improvement suggestions welcome.

Please note that if you literally clip gradients, it will not be possible to take higher order gradients (backprop through backprop) as they will not exist. A differentiable continuous relaxation of the clipping operation will be required.

0reactions

egrefencommented, Jan 16, 2020

@MichaelKonobeev Your utility functions would be most welcome contributions in higher.utils, if you would like to submit a separate PR. It would be helpful if minimal tests for these functions were added to tests/utils.py.

For now, I am closing this issue as I am assuming it is addressed by #21. Please re-open if it does not address the issue, or flag new problems in a separate issue.

Top Results From Across the Web

Understanding Gradient Clipping (and How It Can Fix ...

Gradient Clipping is a method where the error derivative is changed or clipped to a threshold during backward propagation through the network, and...

Introduction to Gradient Clipping Techniques with Tensorflow

Gradient clipping involves capping the error derivatives before propagating them back through the network. The capped gradients are used to update the weights ......

How to Avoid Exploding Gradients With Gradient Clipping

Gradient clipping involves forcing the gradient values (element-wise) to a specific minimum or maximum value if the gradient exceeded an ...

What is Gradient Clipping? - Towards Data Science

Gradient clipping ensures the gradient vector g has norm at most c. This helps gradient descent to have a reasonable behaviour even if...

Gradient Clipping - Medium

Gradient clipping will 'clip' the gradients or cap them to a Threshold value to prevent the gradients from getting too large. In the...