question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UX issue with dvc pull - does not pull entire remote cache

See original GitHub issue

Please provide information about your setup

Mac OS X with:

$ dvc --version
0.35.7

The issue is replicable using the Getting Started workspace.

When set up using these commands:

$  git clone https://github.com/iterative/example-get-started
$  cd example-get-started/
$ pip install -r requirements.txt
$  dvc pull

It appears the cache is incomplete. This is observed by checking out different Git tags and attempting to use dvc checkout.

$ git tag
0-empty
1-initialize
2-remote
3-add-file
4-sources
5-preparation
6-featurization
7-train
8-evaluation
9-bigrams
baseline-experiment
bigrams-experiment

$ git checkout 7-train
Note: checking out '7-train'.

$ dvc status
featurize.dvc:
	changed outs:
		not in cache:       data/features
train.dvc:
	changed deps:
		modified:           data/features
	changed outs:
		not in cache:       model.pkl

$ dvc checkout
ERROR: Failed to load dir cache '.dvc/cache/33/38d2c21bdb521cda0ba4add89e1cb0.dir' - [Errno 2] No such file or directory: '/Volumes/Extra/dvc/example-get-started/.dvc/cache/33/38d2c21bdb521cda0ba4add89e1cb0.dir'

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!
WARNING: Cache 'a66489653d1b6a8ba989799367b32c43' not found. File '{'scheme': 'local', 'path': '/Volumes/Extra/dvc/example-get-started/model.pkl'}' won't be created.
WARNING: Cache '3338d2c21bdb521cda0ba4add89e1cb0.dir' not found. File '{'scheme': 'local', 'path': '/Volumes/Extra/dvc/example-get-started/data/features'}' won't be created.
[##############################] 100% Checkout finished!

$ ls .dvc/cache
38	42	58	68	9d	a3	aa	dc

Notice that the directory .dvc/cache/33 is not there, just as the error message says.

If instead we initialize the workspace using dvc fetch -T or dvc fetch -aT this command does not fail.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
robogeekcommented, Apr 19, 2019

It seems there is confusion about my thought about this. First I filed it because @shcheklein asked me to do so 😉

But to me the issue is not about dvc checkout but about the behavior of dvc pull with no options.

My expectation was that dvc pull with no options would pull down all data files (ditto with dvc fetch).

I was honestly surprised to see that it had not. My initial assumption was that the remote cache used by the example repository was somehow incomplete. Then I noticed @jorgeorpinel had noted the exact same issue earlier in the week.

That two of us fell into the same problem to me indicates that the UX is not correct. Back in the 1980’s on Usenet we used the phrase “principle of least surprise” which says that a program should produce the least surprise for the user. It’s not my call whether dvc pull with no options needs to change its behavior. I’m just saying that I was surprised by the current behavior.

By comparison, git pull with no options ensures that all commits are pulled from the remote repository.

1reaction
efiopcommented, Apr 20, 2019

Maybe some info message that looks something like “Pulling cache for the current workspace. To pull cache for the whole project see dvc pull -h.” ? Would it help to make it more clear?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Data Version Control - DVC
Users may encounter errors when running dvc pull and dvc fetch , like WARNING: Cache 'xxxx' not found. or ERROR: failed to pull...
Read more >
Why Git and Git-LFS is not enough to ... - Towards Data Science
A DVC workspace can push data to, or pull data from, remote storage. The remote storage pool can exist on any of the...
Read more >
dvc - Python Package Health Analysis - Snyk
The download numbers shown are the average weekly downloads from the last 6 weeks. Security. No known security issues.
Read more >
Launching FDS: Ease Of Use And Automation for Git & DVC
DagsHub is launching FDS, a new Open Source Command Line Tool for Fast Data Science. It provides ease of use by automating common...
Read more >
DVC: How to Create a Data Version Control System for MLOps
The main point is that on Github you can't save files larger than 100Mb. This may not be a problem if you develop...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found