configurable default params.yaml (or templating entire pipelines)
See original GitHub issueHi, I have a setup where I use a single pipeline (with several stages) for training multiple models which are almost the same, but use different training data and parameters.
I currently have a copy of a dvc.yaml pipeline in a folder with the respetice params.yaml
file used for each model. It looks more or less like this
stages:
train_test_split:
wdir: ../../../..
cmd: >-
python modules/regression/train_test_split.py
--params=${paths.params_file}
deps: ...
outs: ...
params:
- ${paths.params_file}: # needs to be set due to a different working directory
- paths.data_all
- train_test_split
assemble_model: ...
optimize_hyperparams: ...
fit_model: ...
evaluate: ...
This works (I then always run dvc repro -P
) but I have to copy the pipeline file which makes versioning difficult. The only part that is not (since it cannot be AFAIK) templated is the default params file.
I would love to have a dvc.yaml file in the root folder of my project which can be run with several different params.yaml files from several locations. Kind of like foreach ... do
but on the level of the entire pipeline.
Also, I believe I have to explicitly add the path to the params file under the params
keyword when I am running the stage from a different working directory…Not sure if that is a bug or a feature 😃
Thanks a lot!
P.S.: I tried a similar setup with templating all the stages but there are limitations in the way templating and foreach do work right now and also I feel like this would be a more elegant way to do this. The pipelines and the overall architecture are the same, what is different are the training data and (some) parameters, so having an option like “for each params file in list reproduce a separate instance of the pipeline” would make a lot of sense to me (it would them make sense to have separate lock files as well)
Issue Analytics
- State:
- Created a year ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
Is that something I could help with perhaps (in case this is a feature you’d like to include)? I am not very familiar with the inner workings of dvc at this level of detail but this (a configurable default params file) does not sound particularly complicated and it would definitely help me a lot so I’d love to help implementing that.
The
params
path will be interpreted as relative towdir
the same way asdeps
andouts
. So if you don’t specifyparams
path it would look forparams.yaml
inwdir
.