question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thread-safe file import on S3

See original GitHub issue

I often find that importing large files at the beginning of my workflow to be a large time bottleneck. For instance, if I want to map a pair of fastq files from the EBI FTP site, it can take several hours import each one using toil.importFile(). Same issue for, say, the phased chromosome 1000 Genomes VCFs.

This time could be cut in half if I could import the two fastq files at once. This actually works fine on a local job store, but does not work on a S3 job store. If I recall, it’s because the boto session object is not thread safe. But I don’t think it’s a major change to use multiple sessions (or maybe there’s a thread-safe API somewhere?). It would be a big speedup for importing large files.

Coupling this with some kind of asynchronous / batch import interface would be even better.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-380

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

0reactions
unito-botcommented, May 17, 2021

➤ Adam Novak commented:

Lon says he is fixing this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Thread safe file rename in Amazon Web Services S3
I have an S3 directory that is regularly receiving files from an outside source. I have multiple processes that need to 'process' those...
Read more >
Reusing S3 Connection in Threads #1512 - boto/boto3 - GitHub
I am attempting an upload of files to S3 using concurrent.futures.ThreadPoolExecutor in AWS Lambda. This is a sample of my code: from ...
Read more >
Downloading files from S3 with multithreading and Boto3
Downloading files from S3 with multithreading and Boto3 ; Session · import boto3 ; Client · # Get all S3 buckets using the...
Read more >
TransferManager (AWS SDK for Java - 1.12.370) - Amazon.com
Schedules a new transfer to download data from Amazon S3 and save it to the specified file. This method is non-blocking and returns...
Read more >
S3 File upload & download with AWS Java SDK v2
Service clients in the SDK are thread-safe. For best performance, treat them as long-lived objects. Each client has its own connection pool ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found