dvc pull returns "failed to pull data" when the data exists on remote
See original GitHub issueBug Report
Issue name
dvc pull returns “failed to pull data” when the data exists on remote
Description
dvc pull (also tried with -R option) fails to pull remote data basing on .dvc files from sub-directories and returns ERROR: failed to pull data from the cloud - Checkout failed for following targets:..., however, when I run the pull cmd on failed files individually, the cmd succeeds.
(onboarding_models) radion@MacBook-Pro-Radion anna-datascience % dvc pull
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
name: document_labelling_utils/annotation_results/1000_recent_documents_20210413.json, md5: 06a0a6ef5b6446a33623a544ede8bbfd
1 file failed
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
document_labelling_utils/annotation_results/1000_recent_documents_20210413.json
Is your cache up to date?
<https://error.dvc.org/missing-files>
(onboarding_models) radion@MacBook-Pro-Radion anna-datascience % dvc pull -R
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
name: document_labelling_utils/annotation_results/1000_recent_documents_20210413.json, md5: 06a0a6ef5b6446a33623a544ede8bbfd
1 file failed
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
document_labelling_utils/annotation_results/1000_recent_documents_20210413.json
Is your cache up to date?
<https://error.dvc.org/missing-files>
(onboarding_models) radion@MacBook-Pro-Radion anna-datascience % dvc pull document_labelling_utils/annotation_results/1000_recent_documents_20210413.json.dvc
A document_labelling_utils/annotation_results/1000_recent_documents_20210413.json
1 file added and 1 file fetched
(onboarding_models) radion@MacBook-Pro-Radion anna-datascience % dvc pull
Everything is up to date.
Expected
I expect dvc pull to download missing files from sub-directories without the need to run it on each .dvc file.
Environment information
Output of dvc doctor:
DVC version: 2.7.2 (brew)
---------------------------------
Platform: Python 3.9.7 on macOS-11.2.1-x86_64-i386-64bit
Supports:
azure (adlfs = 2021.8.2, knack = 0.8.2, azure-identity = 1.6.1),
gdrive (pydrive2 = 1.9.3),
gs (gcsfs = 2021.8.1),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
s3 (s3fs = 2021.8.1, boto3 = 1.17.106),
webdav (webdav4 = 0.9.1),
webdavs (webdav4 = 0.9.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (2 by maintainers)
Top Results From Across the Web
Troubleshooting | Data Version Control - DVC
Users may encounter errors when running dvc pull and dvc fetch , like WARNING: Cache 'xxxx' not found. or ERROR: failed to pull...
Read more >"Error: Failed to pull data from the cloud" when pulled ... - GitHub
When pulling data from remote storage, I execute the following command: dvc pull train.dvc with content of the file: train.dvc cmd: python ...
Read more >Getting this weird error when trying to run DVC pull
I am trying to pull data from s3 that was pushed by another person on my team. But I am getting this error:...
Read more >5.1. Reproducible machine learning analyses: DataLad as DVC
But just like any data analysis project, machine learning projects can ... from the data remote to repopulate the cache is done with...
Read more >Data Version Control With Python and DVC - Real Python
Large data and model files go in your DVC remote storage, and small .dvc files that ... You can then extract the dataset...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

The issue has been introduced in 2.5.0.
Is OK
Failed !!
Looks like removing the TRAVERSE_PREFIX_LEN fix my issue.
Tested on master.