dvc plots show: paths relative to `dvc.yaml` instead of `$PWD` in version 2.12.1 and 2.13.0
See original GitHub issueBug Report
Description
In the new release 2.13.0
, dvc plots show
(or diff) does not work with a setup when the dvc.yaml
file is not in the root directory and when the data (for plots) are on a different path relative to the root.
For example with a repo structure like this
.
βββ data
βββ dvc_plots
βββ modules
βββ notebooks
βββ pipelines
where
pipelines
βββ segment_X
β βββ classification
β βββ product_A
β β βββ dvc.lock
β β βββ dvc.yaml
β β βββ params.yaml
β βββ product_B
β β βββ dvc.lock
β β βββ dvc.yaml
β β βββ params.yaml
and
data
βββ segment_X
β βββ classification
β β βββ product_A
β β β βββ precision_recall_curve.csv
β β βββ product_B
β β β βββ precision_recall_curve.csv
where in each dvc.yaml file and each stage we have
wdir: ../../../..
(i.e. the working directory is the root of the repo)
you get the following warning when calling dvc plots show
WARNING: 'pipelines/segment_X/classification/product_A/data/segment_X/classification/product_A/precision_recall_curve.csv' was not found in current workspace.
and similarly with other pipelines and plots. The issue is clearly that dvc seems to use the path to the corresponding dvc.yaml
as the working directory (as the path above is indeed not in the workspace since the data
directory is in the root directory)
Reproduce
- Create a pipeline which is structured as above with
dvc.yaml
in a different directory than the root and outputs in yet another directory. - add some plots to a stage in
dvc.yaml
, the cause might also theoretically come from templating, so try something like
stages:
plot:
wdir: ../../../..
cmd: >-
python modules/evaluate.py
--params=${paths.params_file}
deps: ...
params:
- ${paths.params_file}:
- paths
metrics: ...
plots:
- ${paths.precision_recall_curve}:
title: "Precision recall curve"
x: recall
y: precision
where the params.yaml should be in the same directory as the dvc.yaml and contain the following:
paths:
params_file: pipelines/segment_X/classification/product_A/params.yaml
precision_recall_curve: data/segment_X/classification/product_A/precision_recall_curve.csv
- Call
dvc repro
to create the precision_recall_curve.csv in the first place - Call
dvc plots show
ordvc plots diff
Expected
The behaviour of dvc up till 2.12.0
where the plot paths are found correctly and relative to the $PWD
instead of the directory where the coresponding dvc.yaml
is located.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.13.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.18.7-1-MANJARO-x86_64-with-glibc2.35
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Additional Information (if any):
The new version of dvc
has two new dependencies:
dvc-data-0.0.23
dvc-objects-0.0.23
I am not sure if either of these two could be the cause, but they were both bumped from version 0.0.16
in dvc 2.13.0
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Hi @tibor-mach! Should be fixed in next release
@pared Hi, howβs the progress on this one?