Question: KFAC/EKFAC with SGLD optimizer
See original GitHub issueThanks for publishing the source code for your KFAC/EKFAC preconditioner implementation!
I’m sorry if you think this isn’t the right place to ask/discuss this. But I tried using both KFAC and EKFAC preconditioner in combination with SGLD (basically SGD + Gaussian noise) optimizer to perform K-FAC SGLD. But the result is not as expected: training loss was increasing consistently over time. I kinda stuck because this observation didn’t change despite I tried to randomly increase/decrease the value or flip the boolean value of the various preconditioner’s hyperparameters.
The model that I used contains 2 Conv2d
layers and 2 Linear
layers. Below is the loss and accuracy plots with hyperparameters as follows:
- learning rate: 0.0005
- eps: 0.5
- sua: True
- pi: True
- update_freq: 5
- alpha: 1.
- constraint_norm: True
Changing the optimizer to classic SGD without changing the preconditioner’s hyperparameters worked incredibly well though. It achieved 99.99% accuracy on MNIST after just 3 epochs.
Any suggestion on how to tweak the hyperparameters or to which direction? Or do you think this way of preconditioning SGLD using KFAC/EKFAC won’t work for some reason?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
Our preconditioner works by :
In order to have the preconditioner also applied to your gaussian noise, I recommend you add it to the parameter gradient before calling the preconditioner.
This means:
Thomas
On Mon, Jan 25, 2021 at 10:02 PM Hanif Amal Robbani < notifications@github.com> wrote:
It can be anything as long as it is positive, so it is not limited in the range 0-1
On Sun, Jan 10, 2021, 20:37 Hanif Amal Robbani notifications@github.com wrote: