Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelize `umi_tools extract`

See original GitHub issue

Hi,

I noticed that running umi_tools extract is taking a significant amount of time (~8 hours) with respect to the overall analysis pipeline that I have (200-300 million reads per sample/FASTQ). Could it be faster to first split the FASTQ, then run umi_tools extract with multiple threads and finally merge the parts together? I could probably do this manually in snakemake but I thought it would be more elegant to have it integrated into umi-tools, no?

Thanks, Roman

Issue Analytics

State:
Created 5 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

3reactions

vortexingcommented, Jul 2, 2020

I have umitools extract running and only needed the R2 reads. I ended up using a fifo to pipe the R2 reads from stdout from umitools directly to STAR to align those reads and the entire process is now taking half the time as before. So before I was saving R1 reads as a fastq.gz, and R2 reads as fastq.gz. So the lack of writing out the R1 reads, and the lack of requirement to compress either set of data is making this run MUCH faster. Just an FYI for future googlers. 😉

2reactions

paulranum11commented, Sep 17, 2018

Just curious to know if any improvement to the umi_tools extract speed has been made or if any alternative tools are available that are faster. Any news?

Top Results From Across the Web

FAQ — UMI-tools documentation - Read the Docs

Can I run umi_tools with parallel threads? Not yet! ... If you'd like to help us out, get in touch! What's the correct...

Faster UMI extraction from scRNA-Seq data - Biostars

Using a test dataset of 50,000 reads from 1737 cells I ran this parallelized UMI extraction on 6 cores. The 6 core runtime...

UMIc: A Preprocessing Method for UMI Deduplication and ...

For UMI-tools, the UMI extraction required 158 s (∼2.5 min) to complete, ... of rare mutations with massively parallel sequencing. Proc.

Benchmarking UMI-based single-cell RNA-seq preprocessing ...

For example, UMI-tools introduced a network-based graph approach for ... in the hardware and parallelization settings used for evaluation.

zUMIs - A fast and flexible pipeline to process RNA ...

We compared Drop-seq-tools and UMI-tools with zUMIs using our HEK dataset (227 ... the saturation curve of exon+intron counting runs parallel to the...