checkout: partial directory cache
See original GitHub issueCurrently, if, for some reason, you’ve lost some of the cache files for a part of your directory, dvc will throw assertion error:
ERROR: failed to pull data from the cloud
------------------------------------------------------------
Traceback (most recent call last):
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/command/data_sync.py", line 46, in do_run
recursive=self.args.recursive,
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/repo/pull.py", line 27, in pull
target=target, with_deps=with_deps, force=force, recursive=recursive
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/repo/checkout.py", line 54, in checkout
stage.checkout(force=force, progress_callback=progress_callback)
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/stage.py", line 822, in checkout
force=force, tag=self.tag, progress_callback=progress_callback
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/output/base.py", line 228, in checkout
progress_callback=progress_callback,
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/remote/base.py", line 370, in checkout
progress_callback=progress_callback,
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/remote/local.py", line 353, in do_checkout
self.link(c, p)
File "/var/akonshin/.virtualenvs/tr/lib/python3.5/site-packages/dvc/remote/local.py", line 155, in link
assert os.path.isfile(cache)
AssertionError
------------------------------------------------------------
This is because unlike for regular data files, we don’t check for cache file existence before linking it. With standalone data files we print a warning that “cache file doesn’t exist and file is not going to be created”, so we need to do something similar here.
Some backstory: user on ODS was getting this error and turned out that he had ran dvc gc -c
for some time and then interrupted it, so parts of data were removed and so when he tried to dvc pull
later, he got this error.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:7 (7 by maintainers)
Top Results From Across the Web
How do I clone a subdirectory only of a Git repository?
What you are trying to do is called a sparse checkout, and that feature was added in Git 1.7.0 (Feb. 2012). The steps...
Read more >Partial clone - GitLab Docs
Partial clone is a performance optimization that “allows Git to function without having a complete copy of the repository. The goal of this...
Read more >checkout | Data Version Control - DVC
Missing data files or directories are restored from the cache. Those that don't match with ... It also lists the partial progress made...
Read more >Is it possible to clone only part of a git project?
Now you need to define which files/folders you want to actually check out. This is done by listing them in .git/info/sparse-checkout , eg:...
Read more >git-sparse-checkout Documentation - Git
This command is used to create sparse checkouts, which change the working tree from having all tracked files present to only having a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Mentioned by @efiop cache optimization was introduced in #1526
Closed with https://github.com/iterative/dvc/pull/2090