question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc get: S3 timeout error when trying to dowload files

See original GitHub issue

Bug Report

Description

I have several files tracked with dvc in a S3 bucket. When I try to download these files with dvc get, it throws one of the following errors at some point, resulting with only a couple of these files downloaded:

ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/<role_aws>                
                                                                                                                                                                      
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!  
ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/"                                        
                                                                                                                                                                      
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help! 

However, this does not happen if:

  • I download the data by cloning the git repo and running dvc pull on it.
  • I run the dvc get command with the parameter -j 1.
  • I download the data by using aws s3 cp (without using dvc).

Reproduce

  1. Track a dataset with dvc and a S3 bucket
  2. Run dvc get to download the tracked dataset

Expected

All files are downloaded, not just a couple of them.

Environment information

$ dvc doctor

DVC version: 2.10.2 (pip)                                                                                                                                             
---------------------------------                                                                                                                                     
Platform: Python 3.9.12 on Linux-5.13.0-1023-aws-x86_64-with-glibc2.34                                                                                                
Supports:                                                                                                                                                             
        webhdfs (fsspec = 2022.5.0),                                                                                                                                  
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),                                                                                                                
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),                                                                                                               
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21) 

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
wulllicommented, Sep 7, 2022

Leaving this here since it might be useful for people searching for this error.

We had the exact same error message when we tried to reference files in a data-registry from a different repo using

dvc import git@gitlab-instance:registry.git path/to/file

This was due to the fact that when importing from the remote repository, DVC could not authenticate against the S3 storage, since we had our credentials in a .dvc/config.local file in the registry repository. This file is ignored and not pushed to the git remote. One solution is to use --global for adding remotes and secrets globally on the machine.

$ dvc remote add --global myremote s3://<bucket>
$ dvc config --global remote.myremote.endpointurl <endpoint-url>
$ dvc config --global remote.myremote.access_key_id <access_key_id>
$ dvc config --global remote.myremote.secret_access_key <secret_access_key>

See https://github.com/iterative/dvc/issues/4858 for more info about the config.local issue.

1reaction
efiopcommented, Jul 26, 2022

@Madrueno Are you using regular aws s3 or some s3-compatible storage? Do you have enpoint configured?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Data Version Control - DVC
Failed to pull data from the cloud · Too many open files error · Unable to find credentials · Unable to connect ·...
Read more >
Amazon S3 File Read Timeout. Trying to download a file using ...
New to Amazon S3 usage. I get the following error when trying to access the file from Amazon S3 using a simple java...
Read more >
API Reference — fsspec 2022.11.0+13.g0974514.dirty ...
Given a path or paths, return one OpenFile object. Parameters. urlpath: string or list. Absolute or relative filepath. Prefix with a protocol like...
Read more >
Release history for the Splunk Add-on for AWS
Add-on throws a timeout error in the UI when user attempts to create a new S3 input, but successfully creates the input in...
Read more >
fsspec Documentation - Read the Docs
For example, it is possible for a given path on s3 to be both a valid key (i.e., containing binary data, like a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found