question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A possible reason why some features might be underused is naming inconsistency.

  • dvc stage {add,list}
  • dvc repro
  • dvc run

surely should be unified as dvc stage {add,list,run} or dvc stage {add,list,repro}? Could sanitising these CLI subcommands be part of the next major release?

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:15 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
pmrowlacommented, Jun 27, 2022

The use case for dvc run is for a when a user wants to generate or modify a stage in dvc.yaml. This is redundant and confusing, which is why removing/deprecating it in favor of dvc stage add is preferred. The use case for dvc run is not for reproducing a single stage within an existing pipeline/dvc.yaml.

The one thing that dvc run does right now that dvc stage add does not, is that dvc run can actually execute the stage command once. This can be useful when generating stages because it makes DVC verify that all of the outputs you listed for the stage were actually generated, and that the command itself was executed properly (so run provides a sanity check/validation)

To fill this use case, dvc stage add just needs an extra flag to run the stage once, i.e dvc stage add --run as described in #5846

A) What are the underlying concepts? afaik it’s: 1. Pipelines, 2. Experiments B) does the commandline interface map 1:1 to the above concepts?

  • dvc repro exists to do everything you have described in 1. Pipelines
  • dvc exp run exists to do 2. Experiments.

This seems pretty clear-cut to me.

dvc stage ... exists solely to provide a CLI interface for adding/modifying/removing stages inside dvc.yaml files in the event that a user prefers using the CLI to do it instead of editing the yaml file themselves. It supplements both the “pipelines” and “experiments” use cases, since a user needs to generate dvc.yaml files in both cases.

IMO this is the same as dvc remote ... existing to add/remove/modify remote entries in a DVC configuration file, but dvc push/pull/fetch existing separate from dvc remote .... The remote commands are for configuration. push/pull/fetch are for filling the use case of “store and retrieve files to/from cloud storage”.

Likewise stage ... provide configuration (of dvc.yaml files). repro and exp run are for actually reproducing pipelines and conducting experiments.

3reactions
pmrowlacommented, Jun 9, 2022

I think this is mostly a duplicate of https://github.com/iterative/dvc/issues/5846

Read more comments on GitHub >

github_iconTop Results From Across the Web

repro | Data Version Control - DVC
Description. Provides a way to regenerate data pipeline results, by restoring the dependency graph implicitly defined by the stages listed in dvc.yaml ....
Read more >
DVC: Pipelines Made Reproducible - Gavin Masterson
dvc repro. Then you can sit back and watch the magic… Each stage of the pipeline is executed in sequence. At the end...
Read more >
Working with Pipelines - MLOps Guide
If you are using dvc repro for a second time, DVC will reproduce only those stages that changes have been made. Previous Data...
Read more >
Experimenting and Reproducibility - DagsHub Docs
Now, since nothing has really changed, if we use the dvc repro command, nothing will happen. $ dvc repro Stage 'data/test_data.csv.dvc' didn't change, ......
Read more >
DVC | Permission denied ERROR: failed to reproduce stage ...
Solution 1 .py files weren't running as scripts. They need to be; if you want to run one .py file per stage in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found