Parallelize `umi_tools extract`
See original GitHub issueHi,
I noticed that running umi_tools extract
is taking a significant amount of time (~8 hours) with respect to the overall analysis pipeline that I have (200-300 million reads per sample/FASTQ). Could it be faster to first split the FASTQ, then run umi_tools extract
with multiple threads and finally merge the parts together? I could probably do this manually in snakemake but I thought it would be more elegant to have it integrated into umi-tools, no?
Thanks, Roman
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
FAQ — UMI-tools documentation - Read the Docs
Can I run umi_tools with parallel threads? Not yet! ... If you'd like to help us out, get in touch! What's the correct...
Read more >Faster UMI extraction from scRNA-Seq data - Biostars
Using a test dataset of 50,000 reads from 1737 cells I ran this parallelized UMI extraction on 6 cores. The 6 core runtime...
Read more >UMIc: A Preprocessing Method for UMI Deduplication and ...
For UMI-tools, the UMI extraction required 158 s (∼2.5 min) to complete, ... of rare mutations with massively parallel sequencing. Proc.
Read more >Benchmarking UMI-based single-cell RNA-seq preprocessing ...
For example, UMI-tools introduced a network-based graph approach for ... in the hardware and parallelization settings used for evaluation.
Read more >zUMIs - A fast and flexible pipeline to process RNA ...
We compared Drop-seq-tools and UMI-tools with zUMIs using our HEK dataset (227 ... the saturation curve of exon+intron counting runs parallel to the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have
umitools extract
running and only needed the R2 reads. I ended up using a fifo to pipe the R2 reads from stdout from umitools directly to STAR to align those reads and the entire process is now taking half the time as before. So before I was saving R1 reads as a fastq.gz, and R2 reads as fastq.gz. So the lack of writing out the R1 reads, and the lack of requirement to compress either set of data is making this run MUCH faster. Just an FYI for future googlers. 😉Just curious to know if any improvement to the umi_tools extract speed has been made or if any alternative tools are available that are faster. Any news?