LR Finder doesn't restore original model weights?
See original GitHub issueHey! I love this repo, thanks for making it 💯
Everything works well except for one thing, after some digging around/experimenting, here’s what I’ve found:
Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).
Problem:
- Using LRFinder on a model, and then training with it afterwards appears to hurt the models learning (see pink curve below).
Solution:
- Using LRFinder on a model, and manually restoring the weights, appears to train the model optimally. (see green curve below).
- Using LRFinder on a clone of the model, and then using the original model for training, appears to train the model optimally. (see green curve below).
Regarding the figure/graphs below, both models used the same hyperparameters.
An in-code example of option 1) would be similar to what was given in the README.md:
from torch_lr_finder import LRFinder
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
// Then use "model" for training
An in-code example of option 3) would be:
from torch_lr_finder import LRFinder
model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
Issue Analytics
- State:
- Created 4 years ago
- Comments:23 (13 by maintainers)
Top Results From Across the Web
Learning Rate Finder doesn't reset model parameters #6675
This means the user has to manually reset the model parameters every time after tuning model's learning rate.
Read more >Model and Weights do not load from checkpoint - Stack Overflow
First, the code won't run if you don't already have the weights/model saved. So I commented out the below lines and ran the...
Read more >Learning Rate Finder - Apache MXNet
We implement this method in MXNet (with the Gluon API) and create a 'Learning Rate Finder' which you can use while training your...
Read more >Effective Training Techniques - PyTorch Lightning
Restore the initial state of model and trainer. Warning. Batch size finder is not yet supported for DDP or any of its variations,...
Read more >Keras Learning Rate Finder - PyImageSearch
After training is complete, we reset the initial model weights and learning rate value (Lines 155 and 156). Our final method, plot_loss ,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So I ran some experiments too, check out my project page: Optimizer Benchmarks
The Jupyter Notebooks are in the GitHub Repo, you can view them with the build-in notebook viewer!
Main conclusion from project page:
state
and re-using the same optimizer both achieve very similar performance. i.e. Discarding an optimizer’sstate
didn’t really hurt the model’s performance, with or without an LR Scheduler.Note: Conclusions are based on the Adam optimizer and OneCycle LR Scheduler. I haven’t experimented with other optimizers to see if dropping their
state
is more impactfulOh nice!
Let me know if you’re looking for any help, I’d be more than happy to contribute on any part of this lovely repo! 😃