question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: S3 file access

See original GitHub issue

Is S3 file access already supported? I could not find it mentioned in the documentation.

It was previously mentioned in this issue;

https://github.com/common-workflow-language/cwltool/issues/539

However that one was closed after http access was implemented. But as far as I can tell, this is not sufficient when you need to supply S3 access key and secret key to access the files.

As an example, Nextflow had this featured implemented, described here; https://www.nextflow.io/docs/latest/amazons3.html

so I was hoping to find an equivalent in cwltool

Thanks!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
tetroncommented, Dec 10, 2020

S3 is not supported in cwltool. However other CWL runners do support S3, check out toil-cwl-runner:

https://github.com/DataBiosphere/toil

However if you or someone is interested in adding S3 support to cwltool directly, it would be pretty easy. Here’s where the file download happens:

https://github.com/common-workflow-language/cwltool/blob/main/cwltool/pathmapper.py#L142

https://github.com/common-workflow-language/cwltool/blob/ac60dc1df0c23e54ecee99bc0d989da410851d2e/cwltool/utils.py#L426

So you could do something like import boto3 and add a downloadS3file function when it sees s3 URLs.

The http support uses the CacheControl library for local caching so files are not re-downloaded for every run, you probably want something similar for s3.

0reactions
tetroncommented, Dec 11, 2020

To be clear I’m also not suggesting anything beyond the bare minimum of what cwltool already does for plain http, which is to download files to the local FS right at the start.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Methods for accessing a bucket
You can access your bucket using the Amazon S3 console. Using the console UI, you can perform almost all bucket operations without having...
Read more >
Enable S3 Block Public Access for S3 Buckets
To enable S3 Block Public Access feature for your existing Amazon S3 buckets and restrict public access at the S3 bucket level, perform...
Read more >
How To Secure S3 Buckets Effectively
S3 bucket access logging is a feature to capture information on all requests made to a bucket, such as PUT, GET, and DELETE...
Read more >
S3 Buckets: Accessing, Managing, and Securing Your ...
The bucket name must be unique, begin with a number or lowercase letter, be between 3-63 characters, and may not feature any uppercase ......
Read more >
Feature Request: Remove "s3:ListAllMyBuckets" requirement
they'll warn you with a message and ask if you would like to manually add one, in which case, manually typing the bucket...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found