Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LR Finder doesn't restore original model weights?

See original GitHub issue

Hey! I love this repo, thanks for making it 💯

Everything works well except for one thing, after some digging around/experimenting, here’s what I’ve found:

Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).

Problem:

Using LRFinder on a model, and then training with it afterwards appears to hurt the models learning (see pink curve below).

Solution:

Using LRFinder on a model, and manually restoring the weights, appears to train the model optimally. (see green curve below).
Using LRFinder on a clone of the model, and then using the original model for training, appears to train the model optimally. (see green curve below).

Regarding the figure/graphs below, both models used the same hyperparameters.

An in-code example of option 1) would be similar to what was given in the README.md:

from torch_lr_finder import LRFinder

model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

// Then use "model" for training

An in-code example of option 3) would be:

from torch_lr_finder import LRFinder

model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

Issue Analytics

State:
Created 4 years ago
Comments:23 (13 by maintainers)

Top GitHub Comments

1reaction

AmarSainicommented, Jan 3, 2020

So I ran some experiments too, check out my project page: Optimizer Benchmarks

The Jupyter Notebooks are in the GitHub Repo, you can view them with the build-in notebook viewer!

Main conclusion from project page:

OneCycle LR > Constant LR
Making a new optimizer vs. Preserving state and re-using the same optimizer both achieve very similar performance. i.e. Discarding an optimizer’s state didn’t really hurt the model’s performance, with or without an LR Scheduler.

Note: Conclusions are based on the Adam optimizer and OneCycle LR Scheduler. I haven’t experimented with other optimizers to see if dropping their state is more impactful

1reaction

AmarSainicommented, Dec 21, 2019

Oh nice!

Let me know if you’re looking for any help, I’d be more than happy to contribute on any part of this lovely repo! 😃