For 2.17.0 --> dvc status says http dependency has changed and dvc repro won't work
See original GitHub issueBug Report
dvc status
says dependency has changed and dvc repro
won’t work
Description
Since version 2.17.0 the dvc status
command says a http dependency has changed when it has not.
Reproduce
# dvc.yaml
stages:
datasets:
foreach:
t10k-images-idx3-ubyte.gz: http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz: http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
do:
cmd:
- wget ${item}
deps:
- ${item}
outs:
- ${key}
➜ dvc init --no-scm
Initialized DVC repository.
What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>
aalvarez in /tmp/dvc_test at ml-2
➜ dvc status
datasets@test-set-images:
changed deps:
deleted: http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
changed outs:
deleted: data/test-set-images
datasets@test-set-labels:
changed deps:
deleted: http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
changed outs:
deleted: data/test-set-labels
aalvarez in /tmp/dvc_test at ml-2
➜ dvc repro
ERROR: failed to reproduce 'datasets@test-set-images': dependency 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz' does not exist
Expected
With version 2.16.0
➜ dvc repro
Running stage 'datasets@t10k-labels-idx1-ubyte.gz':
> wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
--2022-08-11 11:04:51-- http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 104.21.29.36, 172.67.171.76, 2606:4700:3034::6815:1d24, ...
Connecting to yann.lecun.com (yann.lecun.com)|104.21.29.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4542 (4.4K) [application/x-gzip]
Saving to: 't10k-labels-idx1-ubyte.gz'
t10k-labels-idx1-ubyte.gz 100%[====================================================================================================>] 4.44K --.-KB/s in 0s
2022-08-11 11:04:51 (320 MB/s) - 't10k-labels-idx1-ubyte.gz' saved [4542/4542]
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
Running stage 'datasets@t10k-images-idx3-ubyte.gz':
> wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
--2022-08-11 11:04:51-- http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 104.21.29.36, 172.67.171.76, 2606:4700:3034::6815:1d24, ...
Connecting to yann.lecun.com (yann.lecun.com)|104.21.29.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1648877 (1.6M) [application/x-gzip]
Saving to: 't10k-images-idx3-ubyte.gz'
t10k-images-idx3-ubyte.gz 100%[====================================================================================================>] 1.57M 8.03MB/s in 0.2s
2022-08-11 11:04:51 (8.03 MB/s) - 't10k-images-idx3-ubyte.gz' saved [1648877/1648877]
Updating lock file 'dvc.lock'
Use `dvc push` to send your updates to remote storage.
Environment information
Output of dvc doctor
:
DVC version: 2.17.0 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-121-generic-x86_64-with-glibc2.14
Supports:
azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.14.0),
gs (gcsfs = 2022.7.1),
hdfs (fsspec = 2022.7.1, pyarrow = 9.0.0),
webhdfs (fsspec = 2022.7.1),
http (aiohttp = 3.8.1, aiohttp-retry = 2.6.1),
https (aiohttp = 3.8.1, aiohttp-retry = 2.6.1),
s3 (s3fs = 2022.7.1, boto3 = 1.21.21),
ssh (sshfs = 2022.6.0),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc (no_scm)
and
DVC version: 2.16.0 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-121-generic-x86_64-with-glibc2.14
Supports:
azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.14.0),
gs (gcsfs = 2022.7.1),
hdfs (fsspec = 2022.7.1, pyarrow = 8.0.0),
webhdfs (fsspec = 2022.7.1),
http (aiohttp = 3.8.1, aiohttp-retry = 2.5.3),
https (aiohttp = 3.8.1, aiohttp-retry = 2.5.3),
s3 (s3fs = 2022.7.1, boto3 = 1.21.21),
ssh (sshfs = 2022.6.0),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc (no_scm)
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
status | Data Version Control - DVC
always changed means that this stage (in dvc.yaml ) has neither dependencies nor outputs, or that the always_changed field set to true (see...
Read more >dvc Changelog - pyup.io
Release notes generated using configuration in .github/release.yml at main --> What's Changed Other Changes * repro dry: show information if the stage is...
Read more >DVC: Pipelines Made Reproducible - Gavin Masterson
If you want to have some fun, try changing / deleting lines of code from any of the files tracked in dvc.yaml ....
Read more >First Impressions of Data Science Version Control (DVC)
The repro step, in particular, is valuable because it ensures that the model I publish includes any changes to any dependencies. In theory...
Read more >DVC | Permission denied ERROR: failed to reproduce stage ...
Solution 1 .py files weren't running as scripts. They need to be; if you want to run one .py file per stage in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, thanks for the fast fix. Unfortunately, I won’t be able to test it for the next two weeks. I’m on holidays 😁
dvc get-url
is also broken for http remotes. I am not sure if other remotes are also affected.