question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

configurable default params.yaml (or templating entire pipelines)

See original GitHub issue

Hi, I have a setup where I use a single pipeline (with several stages) for training multiple models which are almost the same, but use different training data and parameters.

I currently have a copy of a dvc.yaml pipeline in a folder with the respetice params.yaml file used for each model. It looks more or less like this

stages:
 train_test_split:
    wdir: ../../../..
    cmd: >-
      python modules/regression/train_test_split.py
      --params=${paths.params_file}
    deps: ...
    outs: ...
    params:
      - ${paths.params_file}: # needs to be set due to a different working directory
          - paths.data_all
          - train_test_split
  assemble_model: ...
  optimize_hyperparams: ...
  fit_model: ...
  evaluate: ...

This works (I then always run dvc repro -P) but I have to copy the pipeline file which makes versioning difficult. The only part that is not (since it cannot be AFAIK) templated is the default params file.

I would love to have a dvc.yaml file in the root folder of my project which can be run with several different params.yaml files from several locations. Kind of like foreach ... do but on the level of the entire pipeline.

Also, I believe I have to explicitly add the path to the params file under the params keyword when I am running the stage from a different working directory…Not sure if that is a bug or a feature 😃

Thanks a lot!

P.S.: I tried a similar setup with templating all the stages but there are limitations in the way templating and foreach do work right now and also I feel like this would be a more elegant way to do this. The pipelines and the overall architecture are the same, what is different are the training data and (some) parameters, so having an option like “for each params file in list reproduce a separate instance of the pipeline” would make a lot of sense to me (it would them make sense to have separate lock files as well)

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
tibor-machcommented, Jun 28, 2022

Is that something I could help with perhaps (in case this is a feature you’d like to include)? I am not very familiar with the inner workings of dvc at this level of detail but this (a configurable default params file) does not sound particularly complicated and it would definitely help me a lot so I’d love to help implementing that.

1reaction
pmrowlacommented, Jun 28, 2022

Also, I believe I have to explicitly add the path to the params file under the params keyword when I am running the stage from a different working directory…Not sure if that is a bug or a feature 😃

The params path will be interpreted as relative to wdir the same way as deps and outs. So if you don’t specify params path it would look for params.yaml in wdir.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Templates - Azure Pipelines
Parameters to select a template at runtime. You can call different templates from a pipeline YAML depending on a condition.
Read more >
Custom params not working when templating pipeline
I want to use a custom params file to track all the hyperparameters of the model in the framework I'm using (detectron2' config.yaml)...
Read more >
Setting up a Pipeline Template
The default value to be used for the parameter. If specified in the template.yaml file, this value will be shown pre-populated in the...
Read more >
Default values for parameter collections do not work
File: azure-pipeline.yaml trigger: none resources: repositories: ... template parameters are overridden totally or not at all (IE: a dictionary merge ...
Read more >
Template Functions and Pipelines
In an actual chart, all static default values should live in the values.yaml , and should not be repeated using the default command...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found