Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

m0 / v1 init

See original GitHub issue

Allo @lucidrains , I’ve been fiddling with this optimizer, looking promising so far. I was looking for other interpretations out there for my doubts re no bias correction… I’m assuming it’s deemed unecessary due to the explicit m0 and v1 init, but wasn’t 100% sure it wasn’t just left out for clarity.

I noticed you left m0 as zero, and v1 as interpolation with zero init… did you experiment with that vs the notes in paper, Algorithm 1?

The core of my attempt below (note I flipped the betas to be comparable to adam/lamb/etc: .98, .92, .99)

    state = self.state[p]
    if len(state) == 0:
        state['step'] = 0
        state['grad'] = torch.zeros_like(grad)
        state['m'] = torch.clone(grad)  # init m0 = g0
        state['v'] = torch.zeros_like(grad)
        state['n'] = torch.zeros_like(grad)

    m, v, n = state['m'], state['v'], state['n']
    # NOTE first step is no-op as we need g0 & g1 for first grad delta (g1 - g0)
    if state['step'] > 0:
        m.lerp_(grad, 1. - beta1)
        grad_delta = grad - state['grad']
        if state['step'] > 1:
            v.lerp_(grad_delta, 1. - beta2)
        else:
            v.copy_(grad_delta)  # init v1 = g1 - g0
        n.lerp_((grad + beta2 * grad_delta).square(), 1. - beta3)

        # FIXME paper Algorithm 1 includes no bias correction
        # Does m0 and v1 init special cases obliviate the need or was left out of paper for clarity?
        denom = 1 + group['weight_decay'] * lr
        step_size = lr * (n + group['eps']).rsqrt()
        p.addcmul_(step_size, m.add(v, alpha=beta2), value=-1.).div_(denom)

    state['grad'].copy_(grad)
    state['step'] += 1

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:15 (6 by maintainers)

Top GitHub Comments

8reactions

XingyuXiecommented, Aug 29, 2022

Sorry for making something confused here. Adan indeed has the bias correction in the implementation, but we need to consist the algorithm presentation with the theoretical analysis. Hence, we did not explicitly emphasize it in Algorithm1. We’ll release the code in a few days (2-3 days since we have a code review procedure). The log and config files will release together. @rwightman

5reactions

XingyuXiecommented, Aug 30, 2022

@lucidrains Thanks for updating, the following are some minor modifications. When we implement Adan, we refer to some optimizer’s implementation in timm.

Line 55: state['prev_grad'] = grad Line 85-86:

correct_m = 1 / bias_correct1  # correction term for m'
correct_v = 1 / bias_correct2  # correction term for v

Line 91:

weighted_step_size = lr / ((n.sqrt()/sqrt_bias_correct3).add_(eps))

Tips:

For fairness and ease of use, we do not enable the restart condition in practice.
Adan can tolerate a large peak LR. For example, except for the experiments for the pre-training of MAE and LSTM, Adan’s LR is 5-10 times that of Adam/AdamW.
Adan seems to be relatively sensitive to beta3. Adjusting beta1 and beta2 has a limited effect on the results, especially beta2.
Interestingly, we found that weight_decay = 0.02 seems to be suitable for most experiments.

Top Results From Across the Web

CANBED M0 - Longan Docs

CANBed M0 is an upgraded version of CANBed V1. ... This function is used to initialize the baud rate of the CAN Bus...

Firebeetle Board-M0 Wiki - DFRobot

Introduction. FireBeetle is a product series of small development board developed by DFRobot. It contains various chip boards and expansion boards that can ......

Modem Initialization Strings - Cisco

This document provides tables that contain modem initialization strings and sample modem ... Init=AT&F&C1&D3\J0\M0\N7\V1\Q2%C1S7=60S0=1&W. Speed=38400.

Modem Initialization Strings - The Vespiary

InitString =AT&FX4&C1&D3&M4\J0\N3\Q2\V1%C1S7=60 ... InitString=AT&F&C1&D3%C3%G0\J0-M0\N6\Q2\V2S7=60 ... InitString=AT&FW2&C1&D3&K3\J0\N3\Q3\V1%C1"H3S7=60

core/cortex-m0 - chromiumos/platform/ec - Git at Google

chromium / chromiumos / platform / ec / v1.9.0 / . / core / cortex-m0 ... S · init.S · irq_handler.h · ldivmod.S...