dvc stage repro
See original GitHub issueA possible reason why some features might be underused is naming inconsistency.
dvc stage {add,list}
dvc repro
dvc run
surely should be unified as dvc stage {add,list,run}
or dvc stage {add,list,repro}
? Could sanitising these CLI subcommands be part of the next major release?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:15 (8 by maintainers)
Top Results From Across the Web
repro | Data Version Control - DVC
Description. Provides a way to regenerate data pipeline results, by restoring the dependency graph implicitly defined by the stages listed in dvc.yaml ....
Read more >DVC: Pipelines Made Reproducible - Gavin Masterson
dvc repro. Then you can sit back and watch the magic… Each stage of the pipeline is executed in sequence. At the end...
Read more >Working with Pipelines - MLOps Guide
If you are using dvc repro for a second time, DVC will reproduce only those stages that changes have been made. Previous Data...
Read more >Experimenting and Reproducibility - DagsHub Docs
Now, since nothing has really changed, if we use the dvc repro command, nothing will happen. $ dvc repro Stage 'data/test_data.csv.dvc' didn't change, ......
Read more >DVC | Permission denied ERROR: failed to reproduce stage ...
Solution 1 .py files weren't running as scripts. They need to be; if you want to run one .py file per stage in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The use case for
dvc run
is for a when a user wants to generate or modify a stage indvc.yaml
. This is redundant and confusing, which is why removing/deprecating it in favor ofdvc stage add
is preferred. The use case fordvc run
is not for reproducing a single stage within an existing pipeline/dvc.yaml
.The one thing that
dvc run
does right now thatdvc stage add
does not, is thatdvc run
can actually execute the stage command once. This can be useful when generating stages because it makes DVC verify that all of the outputs you listed for the stage were actually generated, and that the command itself was executed properly (sorun
provides a sanity check/validation)To fill this use case,
dvc stage add
just needs an extra flag to run the stage once, i.edvc stage add --run
as described in #5846dvc repro
exists to do everything you have described in1. Pipelines
dvc exp run
exists to do2. Experiments
.This seems pretty clear-cut to me.
dvc stage ...
exists solely to provide a CLI interface for adding/modifying/removing stages insidedvc.yaml
files in the event that a user prefers using the CLI to do it instead of editing the yaml file themselves. It supplements both the “pipelines” and “experiments” use cases, since a user needs to generatedvc.yaml
files in both cases.IMO this is the same as
dvc remote ...
existing to add/remove/modify remote entries in a DVC configuration file, butdvc push/pull/fetch
existing separate fromdvc remote ...
. Theremote
commands are for configuration.push/pull/fetch
are for filling the use case of “store and retrieve files to/from cloud storage”.Likewise
stage ...
provide configuration (ofdvc.yaml
files).repro
andexp run
are for actually reproducing pipelines and conducting experiments.I think this is mostly a duplicate of https://github.com/iterative/dvc/issues/5846