YOGI Initialization
See original GitHub issueexp_avg_sq
Initialization
“Thus, for YOGI, we propose to initialize the vt based on gradient square evaluated at the initial point averaged over a (reasonably large) mini-batch.”
The initial exp_avg_sq
should be initialized to the gradient square.
exp_avg
Initialization
The YOGI optimizer exp_avg
should be initialized to zero instead of initial_accumulator
based on m0
above.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Requirement Yogi - Requirement Yogi
Requirement Yogi is a requirement management tool. ... Integrate Requirement Yogi with external tools. ... Save time on project initialization.
Read more >torch_optimizer.yogi — pytorch-optimizer documentation
[docs]class Yogi(Optimizer): r"""Implements Yogi Optimizer Algorithm. It has been proposed in `Adaptive methods for Nonconvex Optimization`__.
Read more >tfa.optimizers.Yogi | TensorFlow Addons
Optimizer that implements the Yogi algorithm in Keras. ... var, slot_name, initializer='zeros', shape=None
Read more >12.10. Adam — Dive into Deep Learning 1.0.0-beta0 ...
(2018) proposed a hotfix to Adam, called Yogi which addresses these issues. ... could be fixed by a slightly different initialization and update...
Read more >Adaptive Methods for Nonconvex Optimization
Initialization of mt and vt are also important for YOGI and ADAM. These are often initialized with 0 in conjunction with debiasing strategies...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
sure, will submit PR with the change
@PetrochukM Even I had doubts regarding this. So, I referred to the author’s official implementation in tensorflow (https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/yogi.py).
In line 119, they initialized first and second moments with a constant value.