read_csv from S3 times out
See original GitHub issuedata = dask.dataframe.read_csv('s3://test/test.csv')
times out with
ConnectTimeout: HTTPSConnectionPool(host='sts.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<botocore.awsrequest.AWSHTTPSConnection object at 0x7f999ca2c790>, 'Connection to sts.amazonaws.com timed out. (connect timeout=60)'))
however, the following works:
import s3fs
fs = s3fs.S3FileSystem()
fs.ls('test')
I am on ec2 instance which has an IAM role attached with S3FullAccess
Issue Analytics
- State:
- Created 7 years ago
- Comments:40 (40 by maintainers)
Top Results From Across the Web
dask read_csv timeout on Amazon s3 with big files
I think we need a way to set s3fs.S3FileSystem.read_timeout on the remote workers, not the local code, but I have no idea how...
Read more >How to read and process large csv files from s3 bucket using ...
I'm trying to read a large csv file from s3 bucket using "StreamReader". But stream reader is timing out if the file has...
Read more >Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file using the Amazon S3 console. ... The Amazon S3 console might time out during large uploads...
Read more >Processing Large S3 Files With AWS Lambda - Medium
The main approach is as follows: Read and process the csv file row by row until nearing timeout. Trigger a new lambda asynchronously...
Read more >awswrangler.s3.read_csv — AWS SDK for pandas 2.18.0 ...
awswrangler.s3.read_csv¶ ... Read CSV file(s) from a received S3 prefix or list of S3 objects paths. This function accepts Unix shell-style wildcards in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Seems to be this:
botocore.utils.InstanceMetadataFetcher().retrieve_iam_role_credentials()
https://github.com/boto/botocore/blob/develop/botocore/utils.py#L144We still don’t have a way to know if we should distribute (temporary) credentials or not. I see that you suggest the user should have set their environment/config files/IAM roles correctly before starting. That’s what we initially had done too. But if the user provides the key/secret as overrides, sending these to workers might be fine or might be dangerous.
@martindurant it was user error, i wasnt pulling it from
conda-forge
channel correctly. it works now. Thank you!