question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read_csv from S3 times out

See original GitHub issue
data = dask.dataframe.read_csv('s3://test/test.csv')

times out with

ConnectTimeout: HTTPSConnectionPool(host='sts.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<botocore.awsrequest.AWSHTTPSConnection object at 0x7f999ca2c790>, 'Connection to sts.amazonaws.com timed out. (connect timeout=60)'))

however, the following works:

import s3fs
fs = s3fs.S3FileSystem()
fs.ls('test')

I am on ec2 instance which has an IAM role attached with S3FullAccess

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:40 (40 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, May 25, 2016

Seems to be this: botocore.utils.InstanceMetadataFetcher().retrieve_iam_role_credentials() https://github.com/boto/botocore/blob/develop/botocore/utils.py#L144

We still don’t have a way to know if we should distribute (temporary) credentials or not. I see that you suggest the user should have set their environment/config files/IAM roles correctly before starting. That’s what we initially had done too. But if the user provides the key/secret as overrides, sending these to workers might be fine or might be dangerous.

0reactions
hussainsultancommented, Jun 10, 2016

@martindurant it was user error, i wasnt pulling it from conda-forge channel correctly. it works now. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask read_csv timeout on Amazon s3 with big files
I think we need a way to set s3fs.S3FileSystem.read_timeout on the remote workers, not the local code, but I have no idea how...
Read more >
How to read and process large csv files from s3 bucket using ...
I'm trying to read a large csv file from s3 bucket using "StreamReader". But stream reader is timing out if the file has...
Read more >
Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file using the Amazon S3 console. ... The Amazon S3 console might time out during large uploads...
Read more >
Processing Large S3 Files With AWS Lambda - Medium
The main approach is as follows: Read and process the csv file row by row until nearing timeout. Trigger a new lambda asynchronously...
Read more >
awswrangler.s3.read_csv — AWS SDK for pandas 2.18.0 ...
awswrangler.s3.read_csv¶ ... Read CSV file(s) from a received S3 prefix or list of S3 objects paths. This function accepts Unix shell-style wildcards in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found