Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time
See original GitHub issue🚀 Feature
Trainer(max_epochs=100).fit(model, train_dl, ckpt_path=ckpt_path, extra_epochs=True)
would finetune for 100 epochs
Motivation
Finetuning for N epochs requires knowing the previous number of epochs M and setting Trainer(max_epochs=M+N)
. Google did not tell me how to achieve this.
Pitch
Finetuning training time or number of epochs should be configurable.
Alternatives
Setting many epochs and manually stopping
Additional context
It would be cool with max_time
too. I hope this is already solved and this issue is unnecessary.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Trainer — PyTorch Lightning 1.8.5.post0 documentation
Running the training, validation and test dataloaders. Calling the Callbacks at the appropriate times. Putting batches and computations on the correct devices.
Read more >Trainer - Hugging Face
Trainer. The Trainer class provides an API for feature-complete training in PyTorch for most standard use cases. It's used in most of the...
Read more >Is my training finetuing RoBERTa normal? · Issue #999 - GitHub
I followed the official instruction finetune_custom_classification.md. The ACC of mini-batchs is only 72 after 4.5 epochs and there is...
Read more >Transformer Model — darts documentation - GitHub Pages
For more information on PyTorch Lightning Trainers check out this link . This function can be called several times to do some extra...
Read more >Training (tune.Trainable, session.report) — Ray 2.2.0
The Function API allows you to define a custom training function that Tune will run in parallel Ray actor processes, one for each...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You accomplish this by doing:
before
trainer.fit()
is calledThere are 2 potential solutions:
An issue with this method is that it loads the fully checkpoint just for this change. This relates to #5339 and https://github.com/Lightning-AI/lightning/issues/12712
on_load_checkpoint
and modify the Trainer’smax_epochs
. This requires editing theLightningModule
hook to do this or creating aCallback
just for it.