[RFC] Is `fast_dev_run` still needed?
See original GitHub issueProposed refactor
Motivation
This was feedback from a user:
Hi, we found an unexpected error when training a model with fast_dev_run=True. The training finishes with single train and val steps, but writing a further outputs to the checkpoint directory fails.
How is fast_dev_run mode different from just setting to 1 epoch and 1 batch per epoch? My understanding is that fast_dev_run is almost equivalent to
limit_train_batches=1
andlimit_val_batches=1
(or something like that), but when I set these params instead of fast_dev_run there were no issues at all.
The issue was that within the checkpoint callback, we do not create a checkpoint directory if fast_dev_run
is used. It is not obvious at all to users what this flag does based on the documentation. I had to grep through the Lightning codebase to see where all fast_dev_run
takes effect. The ambiguity makes it hard to reason about. The framework is also inconsistent with this flag: should the model summary be printed? Should early stopping take effect? What about loggers or profilers? Why some but not others?
We already have Trainer flags for max_steps
, limit_train_batches
, enable_checkpointing
, etc. Do we still need fast_dev_run
on top of this?
Pitch
Deprecate fast_dev_run
in favor of the specific flags already offered on the Trainer?
Additional context
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
We skip those that would create undesired effects in future runs. You don’t want the
fast_dev_run
run to pollute the logger or checkpoint directories.yeah, i agree that we need to keep this. if we need to expand the functionality then we should do that.
but yes, non expert users aren’t going to know to turn these 2-5 things on to get this effect.
this is like a “shortcut” more than anything else.
so, maybe the action item is to make sure we add these other things to fast_dev_run