Feature request: S3 file access
See original GitHub issueIs S3 file access already supported? I could not find it mentioned in the documentation.
It was previously mentioned in this issue;
https://github.com/common-workflow-language/cwltool/issues/539
However that one was closed after http
access was implemented. But as far as I can tell, this is not sufficient when you need to supply S3 access key and secret key to access the files.
As an example, Nextflow had this featured implemented, described here; https://www.nextflow.io/docs/latest/amazons3.html
so I was hoping to find an equivalent in cwltool
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Methods for accessing a bucket
You can access your bucket using the Amazon S3 console. Using the console UI, you can perform almost all bucket operations without having...
Read more >Enable S3 Block Public Access for S3 Buckets
To enable S3 Block Public Access feature for your existing Amazon S3 buckets and restrict public access at the S3 bucket level, perform...
Read more >How To Secure S3 Buckets Effectively
S3 bucket access logging is a feature to capture information on all requests made to a bucket, such as PUT, GET, and DELETE...
Read more >S3 Buckets: Accessing, Managing, and Securing Your ...
The bucket name must be unique, begin with a number or lowercase letter, be between 3-63 characters, and may not feature any uppercase ......
Read more >Feature Request: Remove "s3:ListAllMyBuckets" requirement
they'll warn you with a message and ask if you would like to manually add one, in which case, manually typing the bucket...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
S3 is not supported in
cwltool
. However other CWL runners do support S3, check outtoil-cwl-runner
:https://github.com/DataBiosphere/toil
However if you or someone is interested in adding S3 support to
cwltool
directly, it would be pretty easy. Here’s where the file download happens:https://github.com/common-workflow-language/cwltool/blob/main/cwltool/pathmapper.py#L142
https://github.com/common-workflow-language/cwltool/blob/ac60dc1df0c23e54ecee99bc0d989da410851d2e/cwltool/utils.py#L426
So you could do something like
import boto3
and add adownloadS3file
function when it seess3
URLs.The
http
support uses the CacheControl library for local caching so files are not re-downloaded for every run, you probably want something similar for s3.To be clear I’m also not suggesting anything beyond the bare minimum of what cwltool already does for plain http, which is to download files to the local FS right at the start.