question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support gradient clipping

See original GitHub issue

When using differentiable optimizer we call diffopt.step(loss) which updates the parameters of a functional model and returns them, therefore it seems not possible to do gradient clipping. Would it be a good idea to add a function-type argument to the step method of differentiable optimizers which takes as input either all_grads or grouped_grads and should returned modified gradients or is there a better way of doing this? I could work on a PR implementing this given some guidance.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
egrefencommented, Jan 14, 2020

@MichaelKonobeev could you please take a look at #21 and see if it fits your needs? Feedback/improvement suggestions welcome.

Please note that if you literally clip gradients, it will not be possible to take higher order gradients (backprop through backprop) as they will not exist. A differentiable continuous relaxation of the clipping operation will be required.

0reactions
egrefencommented, Jan 16, 2020

@MichaelKonobeev Your utility functions would be most welcome contributions in higher.utils, if you would like to submit a separate PR. It would be helpful if minimal tests for these functions were added to tests/utils.py.

For now, I am closing this issue as I am assuming it is addressed by #21. Please re-open if it does not address the issue, or flag new problems in a separate issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Gradient Clipping (and How It Can Fix ...
Gradient Clipping is a method where the error derivative is changed or clipped to a threshold during backward propagation through the network, and...
Read more >
Introduction to Gradient Clipping Techniques with Tensorflow
Gradient clipping involves capping the error derivatives before propagating them back through the network. The capped gradients are used to update the weights ......
Read more >
How to Avoid Exploding Gradients With Gradient Clipping
Gradient clipping involves forcing the gradient values (element-wise) to a specific minimum or maximum value if the gradient exceeded an ...
Read more >
What is Gradient Clipping? - Towards Data Science
Gradient clipping ensures the gradient vector g has norm at most c. This helps gradient descent to have a reasonable behaviour even if...
Read more >
Gradient Clipping - Medium
Gradient clipping will 'clip' the gradients or cap them to a Threshold value to prevent the gradients from getting too large. In the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found