Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

configure_optimizers with OneCycleLR and Pretrain Freeze/Unfreeze

See original GitHub issue

Hello. Thanks for the work on this framework - it’s something I’ve been looking for and I am currently working on transition all my own work from fast.ai to pytorch-lightining. I’m currently stuck on the configure_optimizers step.

For those not familiar, the core workflow of fast.ai goes something like this:

#create model with frozen pretrained resnet backbone and untrained linear head
model = MyResnetBasedModel()
learner = Learner(model, ...)

#train the head
learner.fit_one_cycle(5)

#unfreeze pretrained layers and train whole model
learner.unfreeze()
learner.fit_one_cycle(5)

fast.ai uses it’s own system for implementing the OneCycleScheduler and it’s not the most transparent system. PyTorch has an implementation of the OneCycleScheduler which their documentation illustrates as follows:

data_loader = torch.utils.data.DataLoader(...)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)

Note that OneCycleLR needs to know the total number of steps (or steps per epoch + epochs, from which it determines total steps) in order to generate the correct schedule. configure_optimizers does not appear to offer a way of accessing the necessary values to initialize OneCycleLR, as in my code below.

def configure_optimizers(self):
    optimzer = torch.optim.AdamW(self.parameters(), lr=self.hparams.lr)
    scheduler = torch.optim.lr_scheduler.OneCycleLR(optimzer, self.hparams.lr, ???) #<---
    return optimzer, scheduler

Additionally, it’s unclear how the fast.ai flow of freeze, train, unfreeze, train work with Lightning as it appears that configure_optimizers is called once internally by the trainer. It appears it may be possible to train frozen, checkpoint, load and unfreeze but this does add some extra code overhead.

How can I arrange my code to use OneCycleLR with pretrained freezing/unfreezing? Any guidance on how to approach this would be appreciated.

Thanks.

Issue Analytics

State:
Created 4 years ago
Reactions:5
Comments:9 (5 by maintainers)

Top GitHub Comments

9reactions

RafailFridmancommented, May 25, 2020

An important moment on setting OneCycleLR parameters: If you set the number of epochs with steps per epoch parameter, don’t forget to take into account gradient accumulation like this:

steps_per_epoch = (train_loader_len//self.batch_size)//self.trainer.accumulate_grad_batches

5reactions

0x6b756d6172commented, Mar 12, 2020

Possibly related issues: #1038 #941.

@fabiocapsouza thanks, taking your input, I’m setting up my code as below, where I pass in the epoch via hparams and the steps_per_epoch via len(self.train_dataloader()) which I think should work once everything is in place. Update: this calls the train_dataloader() function which is called again after the configure_optimizers step based on the lifecycle in the documentation. It seems like this double call should be avoidable, especially since train_dataloader() could have heavy computation.

Additionally, OneCycleLR needs to be updated on every batch and it appears the default is to step the lr scheduler every epoch, rather than batch. I believe the return needs to look something like this based on #941 but I am not sure - the documentation isn’t clear on this.

def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.hparams.lr)
        scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, self.hparams.lr, steps_per_epoch=len(self.train_dataloader), epochs=self.hparams.epochs)
        scheduler = {"scheduler": scheduler, "interval" : "step" } #<---???
        return [optimizer], [scheduler]

hparams = ...
hparams.epochs = 3

model = PLModel(hparams)
trainer = pl.Trainer(gpus=1, max_epochs=hparams.epochs)