Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time

See original GitHub issue

🚀 Feature

Trainer(max_epochs=100).fit(model, train_dl, ckpt_path=ckpt_path, extra_epochs=True) would finetune for 100 epochs

Motivation

Finetuning for N epochs requires knowing the previous number of epochs M and setting Trainer(max_epochs=M+N). Google did not tell me how to achieve this.

Pitch

Finetuning training time or number of epochs should be configurable.

Alternatives

Setting many epochs and manually stopping

Additional context

It would be cool with max_time too. I hope this is already solved and this issue is unnecessary.

cc @justusschock @kaushikb11 @awaelchli @borda @rohitgr7

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

carmoccacommented, Jun 14, 2022

You accomplish this by doing:

trainer.fit_loop.max_epochs += 100

before trainer.fit() is called

0reactions

carmoccacommented, Jul 26, 2022

There are 2 potential solutions:

Pre-load the checkpoint manually

ckpt = torch.load(...)
current_epoch = ckpt["current_epoch"]
trainer = Trainer(max_epochs=current_epoch + N)

An issue with this method is that it loads the fully checkpoint just for this change. This relates to #5339 and https://github.com/Lightning-AI/lightning/issues/12712

Extract the state from the checkpoint in on_load_checkpoint and modify the Trainer’s max_epochs. This requires editing the LightningModule hook to do this or creating a Callback just for it.

Top Results From Across the Web

Trainer — PyTorch Lightning 1.8.5.post0 documentation

Running the training, validation and test dataloaders. Calling the Callbacks at the appropriate times. Putting batches and computations on the correct devices.

Trainer - Hugging Face

Trainer. The Trainer class provides an API for feature-complete training in PyTorch for most standard use cases. It's used in most of the...

Is my training finetuing RoBERTa normal? · Issue #999 - GitHub

I followed the official instruction finetune_custom_classification.md. The ACC of mini-batchs is only 72 after 4.5 epochs and there is...

Transformer Model — darts documentation - GitHub Pages

For more information on PyTorch Lightning Trainers check out this link . This function can be called several times to do some extra...

Training (tune.Trainable, session.report) — Ray 2.2.0

The Function API allows you to define a custom training function that Tune will run in parallel Ray actor processes, one for each...