After resuming traing scheduler.step() will not update optimzer's learning rate
See original GitHub issueI find a bug that when I resume training from a checkpoint ,the learning rate always equals the init_lr I set.After debugging, I found that the method scheduler.step() will not change the learning rate of optimizer. So I set it manually to avoid this bug.
def on_epoch_start(self) -> None:
self.optimizers().param_groups[0]['lr'] = self.lr_schedulers().get_lr()[0]
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:12 (2 by maintainers)
Top Results From Across the Web
After resuming traing scheduler.step() will not update ... - GitHub
I find a bug that when I resume training from a checkpoint ,the learning rate always equals the init_lr I set.After debugging, I...
Read more >PyTorch: How to change the learning rate of an optimizer at ...
Now due to some tests which I perform during training, I realize my learning rate is too high so I want to change...
Read more >Learning Rate Schedulers — DeepSpeed 0.8.0 documentation
LRRT changes the learning rate after every batch. step should be called after a batch has been used for training. Parameters. optimizer (Optimizer)...
Read more >Optimization - Hugging Face
Create a schedule with a learning rate that decreases as a polynomial decay from the initial lr set in the optimizer to end...
Read more >Loading optimizer dict starts training from initial LR
Note that if you have not saved the scheduler, you can still fix this problem. Just add a for-loop to add steps to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I had the same issue and looked into it a little bit. It turns out that by default self.optimizers() returns from trainer.strategy._lightning_optimizers, and LightningOptimizer maintains a copy of the param_groups field. The parameters all are stored as references to the actual parameters, but the learning rate is not. This behaviour traces back to load_state_dict of the pytorch optimizer, which overwrites the param_groups list with a list from state dict, but it plugs back in the ‘params’ value. So at that point the copy of param_groups maintained by LightningOptimizer is no longer kept up-to-date.
I think a simple solution would be to have the strategy create/update its _lightning_optimizers after a restore from checkpoint. As a user, you can call
self.optimizers(use_pl_optimizer=False).param_groups[0]['lr']
instead to fix the issue for now, though I don’t know if not using the LightningOptimizer wrapper will have side effects when using the various training strategies.Little example: After a fit() which restored from a checkpoint it looks like this (with a LR of 1e-4, and a scheduler starting at factor 1e-3):
I have checked that schedule and Optimizer have different learning rates.Schedule’s learning rate is correct, but the Optimizer’s learning rate cannot be updated by schedule
---- 回复的原邮件 ---- | 发件人 | Rohit @.> | | 日期 | 2022年04月20日 16:27 | | 收件人 | @.> | | 抄送至 | @.@.> | | 主题 | Re: [PyTorchLightning/pytorch-lightning] After resuming traing scheduler.step() will not update optimzer’s learning rate (Issue #12812) |
did you check the actual learning here?
self.optimizers().param_groups[0][‘lr’]
since while resuming the optimizer’s state is also restored which includes the learning rate as well.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>