question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

For 2.17.0 --> dvc status says http dependency has changed and dvc repro won't work

See original GitHub issue

Bug Report

dvc status says dependency has changed and dvc repro won’t work

Description

Since version 2.17.0 the dvc status command says a http dependency has changed when it has not.

Reproduce

# dvc.yaml
stages:
  datasets:
    foreach:
      t10k-images-idx3-ubyte.gz: http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
      t10k-labels-idx1-ubyte.gz: http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    do:
      cmd:
        - wget ${item}
      deps:
        - ${item}
      outs:
        - ${key}
➜ dvc init --no-scm
Initialized DVC repository.

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>

aalvarez in /tmp/dvc_test at ml-2
➜ dvc status
datasets@test-set-images:
        changed deps:
                deleted:            http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
        changed outs:
                deleted:            data/test-set-images
datasets@test-set-labels:
        changed deps:
                deleted:            http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
        changed outs:
                deleted:            data/test-set-labels

aalvarez in /tmp/dvc_test at ml-2
➜ dvc repro
ERROR: failed to reproduce 'datasets@test-set-images': dependency 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz' does not exist

Expected

With version 2.16.0

➜ dvc repro
Running stage 'datasets@t10k-labels-idx1-ubyte.gz':
> wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
--2022-08-11 11:04:51--  http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 104.21.29.36, 172.67.171.76, 2606:4700:3034::6815:1d24, ...
Connecting to yann.lecun.com (yann.lecun.com)|104.21.29.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4542 (4.4K) [application/x-gzip]
Saving to: 't10k-labels-idx1-ubyte.gz'

t10k-labels-idx1-ubyte.gz                      100%[====================================================================================================>]   4.44K  --.-KB/s    in 0s

2022-08-11 11:04:51 (320 MB/s) - 't10k-labels-idx1-ubyte.gz' saved [4542/4542]

Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

Running stage 'datasets@t10k-images-idx3-ubyte.gz':
> wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
--2022-08-11 11:04:51--  http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 104.21.29.36, 172.67.171.76, 2606:4700:3034::6815:1d24, ...
Connecting to yann.lecun.com (yann.lecun.com)|104.21.29.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1648877 (1.6M) [application/x-gzip]
Saving to: 't10k-images-idx3-ubyte.gz'

t10k-images-idx3-ubyte.gz                      100%[====================================================================================================>]   1.57M  8.03MB/s    in 0.2s

2022-08-11 11:04:51 (8.03 MB/s) - 't10k-images-idx3-ubyte.gz' saved [1648877/1648877]

Updating lock file 'dvc.lock'
Use `dvc push` to send your updates to remote storage.

Environment information

Output of dvc doctor:

DVC version: 2.17.0 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-121-generic-x86_64-with-glibc2.14
Supports:
        azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.14.0),
        gs (gcsfs = 2022.7.1),
        hdfs (fsspec = 2022.7.1, pyarrow = 9.0.0),
        webhdfs (fsspec = 2022.7.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.6.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.6.1),
        s3 (s3fs = 2022.7.1, boto3 = 1.21.21),
        ssh (sshfs = 2022.6.0),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc (no_scm)

and

DVC version: 2.16.0 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-121-generic-x86_64-with-glibc2.14
Supports:
        azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.14.0),
        gs (gcsfs = 2022.7.1),
        hdfs (fsspec = 2022.7.1, pyarrow = 8.0.0),
        webhdfs (fsspec = 2022.7.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.3),
        s3 (s3fs = 2022.7.1, boto3 = 1.21.21),
        ssh (sshfs = 2022.6.0),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc (no_scm)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
alealvcommented, Aug 17, 2022

Hi, thanks for the fast fix. Unfortunately, I won’t be able to test it for the next two weeks. I’m on holidays 😁

1reaction
skshetrycommented, Aug 13, 2022

dvc get-url is also broken for http remotes. I am not sure if other remotes are also affected.

Read more comments on GitHub >

github_iconTop Results From Across the Web

status | Data Version Control - DVC
always changed means that this stage (in dvc.yaml ) has neither dependencies nor outputs, or that the always_changed field set to true (see...
Read more >
dvc Changelog - pyup.io
Release notes generated using configuration in .github/release.yml at main --> What's Changed Other Changes * repro dry: show information if the stage is...
Read more >
DVC: Pipelines Made Reproducible - Gavin Masterson
If you want to have some fun, try changing / deleting lines of code from any of the files tracked in dvc.yaml ....
Read more >
First Impressions of Data Science Version Control (DVC)
The repro step, in particular, is valuable because it ensures that the model I publish includes any changes to any dependencies. In theory...
Read more >
DVC | Permission denied ERROR: failed to reproduce stage ...
Solution 1 .py files weren't running as scripts. They need to be; if you want to run one .py file per stage in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found