Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] Is `fast_dev_run` still needed?

See original GitHub issue

Proposed refactor

Motivation

This was feedback from a user:

Hi, we found an unexpected error when training a model with fast_dev_run=True. The training finishes with single train and val steps, but writing a further outputs to the checkpoint directory fails.

How is fast_dev_run mode different from just setting to 1 epoch and 1 batch per epoch? My understanding is that fast_dev_run is almost equivalent to limit_train_batches=1 and limit_val_batches=1 (or something like that), but when I set these params instead of fast_dev_run there were no issues at all.

The issue was that within the checkpoint callback, we do not create a checkpoint directory if fast_dev_run is used. It is not obvious at all to users what this flag does based on the documentation. I had to grep through the Lightning codebase to see where all fast_dev_run takes effect. The ambiguity makes it hard to reason about. The framework is also inconsistent with this flag: should the model summary be printed? Should early stopping take effect? What about loggers or profilers? Why some but not others?

We already have Trainer flags for max_steps, limit_train_batches, enable_checkpointing, etc. Do we still need fast_dev_run on top of this?

Pitch

Deprecate fast_dev_run in favor of the specific flags already offered on the Trainer?

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @borda @justusschock @kaushikb11 @awaelchli @rohitgr7

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

carmoccacommented, Apr 13, 2022

Why some but not others?

We skip those that would create undesired effects in future runs. You don’t want the fast_dev_run run to pollute the logger or checkpoint directories.

1reaction

williamFalconcommented, Apr 13, 2022

yeah, i agree that we need to keep this. if we need to expand the functionality then we should do that.

but yes, non expert users aren’t going to know to turn these 2-5 things on to get this effect.

this is like a “shortcut” more than anything else.

so, maybe the action item is to make sure we add these other things to fast_dev_run

Top Results From Across the Web

[RFC] Is `fast_dev_run` still needed? · Issue #12738 - GitHub

Proposed refactor Motivation This was feedback from a user: Hi, we found an unexpected error when training a model with fast_dev_run=True.

RFC 6949 - RFC Series Format Requirements and Future ...

The requirements described in this document will determine what changes will be made to RFC format. This document updates RFC 2223.

RFC 816: Fault isolation and recovery

In fact, it is never necessary for a host to explicitly ask a gateway for advice, because the gateway will provide it as...

What is a Request for Comments (RFC)? - TechTarget

Request for Comments is a formal document from the IETF that includes specifications and notes about internet and computer networking topics. Read more....

RFC Editor Future Development Program Meeting, 2020-05-14

RFC Editor Future Development Program Meeting, 2020-05-14.