m0 / v1 init
See original GitHub issueAllo @lucidrains , I’ve been fiddling with this optimizer, looking promising so far. I was looking for other interpretations out there for my doubts re no bias correction… I’m assuming it’s deemed unecessary due to the explicit m0 and v1 init, but wasn’t 100% sure it wasn’t just left out for clarity.
I noticed you left m0 as zero, and v1 as interpolation with zero init… did you experiment with that vs the notes in paper, Algorithm 1?
The core of my attempt below (note I flipped the betas to be comparable to adam/lamb/etc: .98, .92, .99)
state = self.state[p]
if len(state) == 0:
state['step'] = 0
state['grad'] = torch.zeros_like(grad)
state['m'] = torch.clone(grad) # init m0 = g0
state['v'] = torch.zeros_like(grad)
state['n'] = torch.zeros_like(grad)
m, v, n = state['m'], state['v'], state['n']
# NOTE first step is no-op as we need g0 & g1 for first grad delta (g1 - g0)
if state['step'] > 0:
m.lerp_(grad, 1. - beta1)
grad_delta = grad - state['grad']
if state['step'] > 1:
v.lerp_(grad_delta, 1. - beta2)
else:
v.copy_(grad_delta) # init v1 = g1 - g0
n.lerp_((grad + beta2 * grad_delta).square(), 1. - beta3)
# FIXME paper Algorithm 1 includes no bias correction
# Does m0 and v1 init special cases obliviate the need or was left out of paper for clarity?
denom = 1 + group['weight_decay'] * lr
step_size = lr * (n + group['eps']).rsqrt()
p.addcmul_(step_size, m.add(v, alpha=beta2), value=-1.).div_(denom)
state['grad'].copy_(grad)
state['step'] += 1
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:15 (6 by maintainers)
Top Results From Across the Web
CANBED M0 - Longan Docs
CANBed M0 is an upgraded version of CANBed V1. ... This function is used to initialize the baud rate of the CAN Bus...
Read more >Firebeetle Board-M0 Wiki - DFRobot
Introduction. FireBeetle is a product series of small development board developed by DFRobot. It contains various chip boards and expansion boards that can ......
Read more >Modem Initialization Strings - Cisco
This document provides tables that contain modem initialization strings and sample modem ... Init=AT&F&C1&D3\J0\M0\N7\V1\Q2%C1S7=60S0=1&W. Speed=38400.
Read more >Modem Initialization Strings - The Vespiary
InitString =AT&FX4&C1&D3&M4\J0\N3\Q2\V1%C1S7=60 ... InitString=AT&F&C1&D3%C3%G0\J0-M0\N6\Q2\V2S7=60 ... InitString=AT&FW2&C1&D3&K3\J0\N3\Q3\V1%C1"H3S7=60
Read more >core/cortex-m0 - chromiumos/platform/ec - Git at Google
chromium / chromiumos / platform / ec / v1.9.0 / . / core / cortex-m0 ... S · init.S · irq_handler.h · ldivmod.S...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry for making something confused here. Adan indeed has the bias correction in the implementation, but we need to consist the algorithm presentation with the theoretical analysis. Hence, we did not explicitly emphasize it in Algorithm1. We’ll release the code in a few days (2-3 days since we have a code review procedure). The log and config files will release together. @rwightman
@lucidrains Thanks for updating, the following are some minor modifications. When we implement Adan, we refer to some optimizer’s implementation in timm.
Line 55:
state['prev_grad'] = grad
Line 85-86:Line 91:
Tips:
weight_decay = 0.02
seems to be suitable for most experiments.