question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`pull`: `-R` does not check immediate target

See original GitHub issue

Bug Report

Description

Firstly, from the docs I realise that pull -R <target> is probably working exactly as advertised.

In the VS Code extension, we show a tracked tree which can be used to selectively pull files from the remote.

We currently use the output of dvc list . -R --show-json --dvc-only to generate this tree (we will shortly be using the output from the new data:status command). We mark everything provided by the list output as tracked.

When calling pull against these tracked paths we check to see if the path exists in the list output. If it does then we call dvc pull <target>. If it does not we call dvc pull -R <target>.

When calling dvc pull -R we get mixed results. Here is an example of -R stating that everything is up to date when things clearly haven’t changed:

https://user-images.githubusercontent.com/37993418/168737919-52548709-2a98-4f30-8658-53bd16c2b709.mov

dvc.yaml for the above project is here. training_metrics is tracked but there is no way currently for us to easily/consistently tell this from the combined output of list, status & diff.

Reproduce

  1. Open demo project for the first time.
  2. Run dvc pull -R training_metrics from the root.
  3. “everything is up to date” will be returned by the command
  4. No data will have been updated.

Expected

dvc pull -R target checks the target as well as all searching inside the target.

We could take the alternative approach of including the appropriate information in the new data:status command. I.e training_metrics/ would be provided as part of the output to let us know that it is tracked.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.9 on macOS-12.3.1-x86_64-i386-64bit
Supports:
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc (subdir), git

Additional Information (if any):

Please let me know if you need anything else from me. Thanks

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:34 (32 by maintainers)

github_iconTop GitHub Comments

3reactions
shchekleincommented, May 20, 2022

@skshetry yep, usually I’m testing on the dev version, in this case dvc was coming from a different project. It is improved to 0.21! 🎉

2reactions
dberenbaumcommented, May 20, 2022

yep, I right click on data ask to pull and exepct it to bring me data/MNIST/raw inside. W/o me going two level down (I just don’t even know which one is tracked). Or in the example-get-started I’d like to do dvc pull -R data to download data.xml and some intemediate things inside … again, it’a all about simple and intuitive interface to manipulate data.

Current behavior is not useful at all to my mind and comes from some legacy (when dvc pull was taking only .dvc files as targets).

Yup, makes sense. We need to move all our commands towards operating on all DVC-tracked data within a path without the users worrying about where the paths are specified in .dvc, dvc.yaml, etc. I think the current dvc pull -R logic is how most DVC commands work, so I would like to have a more systematic effort to change it across commands rather than have inconsistent behavior.

I think it’s somewhat related to the goal to “auto manage directories,” which is currently planned for Q3, and dvc data status is planned with this in mind.

@shcheklein @mattseddon What is the priority for VS Code (when do you need it)?

@efiop Any thoughts?

Read more comments on GitHub >

github_iconTop Results From Across the Web

When I `git pull --rebase` and get a conflict, how do I `git show ...
Now we execute the second command that git pull would run, i.e., git rebase . The rebase command works by copying some existing...
Read more >
git-reset Documentation - Git
But you decided that the topic branch is not ready for public consumption yet. "pull" or "merge" always leaves the original tip of...
Read more >
Considerations for Retail Operations Post COVID-19
Taking your temperature to ensure it is below 100.4°F / 38°C. • Checking for symptoms of COVID-19, including fever, cough, shortness of breath....
Read more >
Tcpdump Examples - 22 Tactical Commands | HackerTarget ...
In these tcpdump examples you will find 22 tactical commands to zero in on the key packets. Know your network with this powerful...
Read more >
SEND EXTRACT - Oracle Help Center
If an Extract is not running, an error is returned. BR {BRINTERVAL interval | BRSTART | BRSTOP |; BRCHECKPOINT {IMMEDIATE | IN n ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found