dvc get: S3 timeout error when trying to dowload files
See original GitHub issueBug Report
Description
I have several files tracked with dvc in a S3 bucket. When I try to download these files with dvc get
, it throws one of the following errors at some point, resulting with only a couple of these files downloaded:
ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/<role_aws>
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/"
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
However, this does not happen if:
- I download the data by cloning the git repo and running
dvc pull
on it. - I run the
dvc get
command with the parameter-j 1
. - I download the data by using
aws s3 cp
(without using dvc).
Reproduce
- Track a dataset with dvc and a S3 bucket
- Run
dvc get
to download the tracked dataset
Expected
All files are downloaded, not just a couple of them.
Environment information
$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.12 on Linux-5.13.0-1023-aws-x86_64-with-glibc2.34
Supports:
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Issue Analytics
- State:
- Created a year ago
- Comments:15 (3 by maintainers)
Top Results From Across the Web
Troubleshooting | Data Version Control - DVC
Failed to pull data from the cloud · Too many open files error · Unable to find credentials · Unable to connect ·...
Read more >Amazon S3 File Read Timeout. Trying to download a file using ...
New to Amazon S3 usage. I get the following error when trying to access the file from Amazon S3 using a simple java...
Read more >API Reference — fsspec 2022.11.0+13.g0974514.dirty ...
Given a path or paths, return one OpenFile object. Parameters. urlpath: string or list. Absolute or relative filepath. Prefix with a protocol like...
Read more >Release history for the Splunk Add-on for AWS
Add-on throws a timeout error in the UI when user attempts to create a new S3 input, but successfully creates the input in...
Read more >fsspec Documentation - Read the Docs
For example, it is possible for a given path on s3 to be both a valid key (i.e., containing binary data, like a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Leaving this here since it might be useful for people searching for this error.
We had the exact same error message when we tried to reference files in a data-registry from a different repo using
This was due to the fact that when importing from the remote repository, DVC could not authenticate against the S3 storage, since we had our credentials in a .dvc/config.local file in the registry repository. This file is ignored and not pushed to the git remote. One solution is to use
--global
for adding remotes and secrets globally on the machine.See https://github.com/iterative/dvc/issues/4858 for more info about the
config.local
issue.@Madrueno Are you using regular aws s3 or some s3-compatible storage? Do you have enpoint configured?