question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelize `umi_tools extract`

See original GitHub issue

Hi,

I noticed that running umi_tools extract is taking a significant amount of time (~8 hours) with respect to the overall analysis pipeline that I have (200-300 million reads per sample/FASTQ). Could it be faster to first split the FASTQ, then run umi_tools extract with multiple threads and finally merge the parts together? I could probably do this manually in snakemake but I thought it would be more elegant to have it integrated into umi-tools, no?

Thanks, Roman

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
vortexingcommented, Jul 2, 2020

I have umitools extract running and only needed the R2 reads. I ended up using a fifo to pipe the R2 reads from stdout from umitools directly to STAR to align those reads and the entire process is now taking half the time as before. So before I was saving R1 reads as a fastq.gz, and R2 reads as fastq.gz. So the lack of writing out the R1 reads, and the lack of requirement to compress either set of data is making this run MUCH faster. Just an FYI for future googlers. 😉

2reactions
paulranum11commented, Sep 17, 2018

Just curious to know if any improvement to the umi_tools extract speed has been made or if any alternative tools are available that are faster. Any news?

Read more comments on GitHub >

github_iconTop Results From Across the Web

FAQ — UMI-tools documentation - Read the Docs
Can I run umi_tools with parallel threads? Not yet! ... If you'd like to help us out, get in touch! What's the correct...
Read more >
Faster UMI extraction from scRNA-Seq data - Biostars
Using a test dataset of 50,000 reads from 1737 cells I ran this parallelized UMI extraction on 6 cores. The 6 core runtime...
Read more >
UMIc: A Preprocessing Method for UMI Deduplication and ...
For UMI-tools, the UMI extraction required 158 s (∼2.5 min) to complete, ... of rare mutations with massively parallel sequencing. Proc.
Read more >
Benchmarking UMI-based single-cell RNA-seq preprocessing ...
For example, UMI-tools introduced a network-based graph approach for ... in the hardware and parallelization settings used for evaluation.
Read more >
zUMIs - A fast and flexible pipeline to process RNA ...
We compared Drop-seq-tools and UMI-tools with zUMIs using our HEK dataset (227 ... the saturation curve of exon+intron counting runs parallel to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found