Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thread-safe file import on S3

See original GitHub issue

I often find that importing large files at the beginning of my workflow to be a large time bottleneck. For instance, if I want to map a pair of fastq files from the EBI FTP site, it can take several hours import each one using toil.importFile(). Same issue for, say, the phased chromosome 1000 Genomes VCFs.

This time could be cut in half if I could import the two fastq files at once. This actually works fine on a local job store, but does not work on a S3 job store. If I recall, it’s because the boto session object is not thread safe. But I don’t think it’s a major change to use multiple sessions (or maybe there’s a thread-safe API somewhere?). It would be a big speedup for importing large files.

Coupling this with some kind of asynchronous / batch import interface would be even better.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-380

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

DailyDreamingcommented, Apr 7, 2021

@w-gao @adamnovak Reference is here: https://boto3.amazonaws.com/v1/documentation/api/1.14.31/guide/session.html#multithreading-or-multiprocessing-with-sessions

0reactions

unito-botcommented, May 17, 2021

➤ Adam Novak commented:

Lon says he is fixing this.

Read more comments on GitHub >

Top Results From Across the Web

Thread safe file rename in Amazon Web Services S3

I have an S3 directory that is regularly receiving files from an outside source. I have multiple processes that need to 'process' those...

Reusing S3 Connection in Threads #1512 - boto/boto3 - GitHub

I am attempting an upload of files to S3 using concurrent.futures.ThreadPoolExecutor in AWS Lambda. This is a sample of my code: from ...

Downloading files from S3 with multithreading and Boto3

Downloading files from S3 with multithreading and Boto3 ; Session · import boto3 ; Client · # Get all S3 buckets using the...

TransferManager (AWS SDK for Java - 1.12.370) - Amazon.com

Schedules a new transfer to download data from Amazon S3 and save it to the specified file. This method is non-blocking and returns...

S3 File upload & download with AWS Java SDK v2

Service clients in the SDK are thread-safe. For best performance, treat them as long-lived objects. Each client has its own connection pool ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

--targetTime for Slurm? Cactus/Toil impacting Slurm

Option --clean=never is not being respected by toil-wdl-runner