Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: KFAC/EKFAC with SGLD optimizer

See original GitHub issue

Thanks for publishing the source code for your KFAC/EKFAC preconditioner implementation!

I’m sorry if you think this isn’t the right place to ask/discuss this. But I tried using both KFAC and EKFAC preconditioner in combination with SGLD (basically SGD + Gaussian noise) optimizer to perform K-FAC SGLD. But the result is not as expected: training loss was increasing consistently over time. I kinda stuck because this observation didn’t change despite I tried to randomly increase/decrease the value or flip the boolean value of the various preconditioner’s hyperparameters.

The model that I used contains 2 Conv2d layers and 2 Linear layers. Below is the loss and accuracy plots with hyperparameters as follows:

learning rate: 0.0005
eps: 0.5
sua: True
pi: True
update_freq: 5
alpha: 1.
constraint_norm: True

ksgld_loss ksgld_accuracy

Changing the optimizer to classic SGD without changing the preconditioner’s hyperparameters worked incredibly well though. It achieved 99.99% accuracy on MNIST after just 3 epochs.

Any suggestion on how to tweak the hyperparameters or to which direction? Or do you think this way of preconditioning SGLD using KFAC/EKFAC won’t work for some reason?

Issue Analytics

State:
Created 3 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

tfjgeorgecommented, Jan 26, 2021

Our preconditioner works by :

taking the gradient in the parameter gradients (see https://github.com/Thrandis/EKFAC-pytorch/blob/master/ekfac.py#L94 )
multiplying with the inverse Fisher
putting back the natural gradient in the parameter gradients (see https://github.com/Thrandis/EKFAC-pytorch/blob/master/ekfac.py#L111)

In order to have the preconditioner also applied to your gaussian noise, I recommend you add it to the parameter gradient before calling the preconditioner.

This means:

Calling loss.backward() (this populates all param.grad gradients)
Looping through all parameters and adding gaussian noise with param.grad.data.add_(epsilon)
Calling precond.step()
Calling optimizer.step()

Thomas

On Mon, Jan 25, 2021 at 10:02 PM Hanif Amal Robbani < notifications@github.com> wrote:

I got an idea! I think the problem is because preconditioning applied only to the gradient while it should’ve been applied to both gradient and noise in case of SGLD. From what I understand, I need to multiply the approximate inverse of Fisher https://render.githubusercontent.com/render/math?math=\widetilde{F}^{-1} with the noise https://render.githubusercontent.com/render/math?math=\epsilon:

[image: image] https://user-images.githubusercontent.com/11190794/105792552-3b659180-5fba-11eb-883c-a08b4f1ebe4c.png

*) this version of parameter update formula in KSGLD is taken from https://arxiv.org/abs/1806.02855

From what I understand, KFAC/EKFAC calculate some approximation (block-diagonal approximation) of the inverse Fisher, then use it to precondition the gradient, right? So do you have suggestion on where in KFAC preconditioner code that the approximate inverse Fisher is ready for use? I tried to look at the code but couldn’t firmly point on where calculation of approximate inverse Fisher is done exactly

Sorry for bothering you again. I’ve been grateful for your reply so far!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Thrandis/EKFAC-pytorch/issues/7#issuecomment-767256962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALTMWJQHCHRPZPPXMQ6ON3S3YWEFANCNFSM4V4HZ5YQ .

1reaction

Thrandiscommented, Jan 11, 2021

It can be anything as long as it is positive, so it is not limited in the range 0-1

On Sun, Jan 10, 2021, 20:37 Hanif Amal Robbani notifications@github.com wrote:

Thanks a lot for replying!

the value of eps should be in range between [0,1] right?

I think I have read your comment on the intuition of eps from the other github issue, and probably have tried eps=1 as well, but the result wasn’t any better. But maybe I just messed some other parameters while trying that. So, I’ll try again to make sure, maybe with eps=0.99

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Thrandis/EKFAC-pytorch/issues/7#issuecomment-757584374, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP6NEGSI6ECAZU555QBTFLSZJI4RANCNFSM4V4HZ5YQ .

Top Results From Across the Web

tfp.optimizer.StochasticGradientLangevinDynamics

An optimizer module for stochastic gradient Langevin dynamics. ... This example demonstrates that for a fixed step size SGLD works as an approximate...

Bayesian inference with Stochastic Gradient Langevin Dynamics

This lets SGLD overcome the curse of dimensionality and the need to evaluate the likelihood function over all the data (one mini-batch is...

probability/sgld.py at main · tensorflow/probability - GitHub

"""An optimizer module for stochastic gradient Langevin dynamics. This implements the preconditioned Stochastic Gradient Langevin Dynamics. optimizer [(Li et al ...

Global Convergence of Langevin Dynamics Based Algorithms ...

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization ...

Source code for pysgmcmc.optimizers.sgld - Read the Docs

[docs]class SGLD(Optimizer): """ Stochastic Gradient Langevin Dynamics Sampler with preconditioning. Optimization variable is viewed as a posterior sample ...