Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

some samples running easily, others never finishing with dedup

See original GitHub issue

Hi folks,

I’m using umi_tools 1.0.0 on two cohorts of miRNA BAM files.

Set 1 has about 40 million reads per bam with UMI format in the RX tag example of “CAGC-CCAC”

Set 2 has about 10 million reads per bam with the UMIs in the RX tab being slightly longer as in “AACCTC-AAATTG”

All dedup commands look like one of the following (I’ve tried both and gotten similar results):

umi_tools dedup -I ${1} --extract-umi-method=tag --umi-tag=RX -S ${1}.umi_tools_100_deduplicated.bam --output-stats=${1}.umi_tools_100_deduplicated.stats

umi_tools dedup -I ${1} --extract-umi-method=tag --umi-tag=RX --read-length -S ${1}.umi_tools_100_deduplicated_read_length.bam --output-stats=${1}.umi_tools_100_deduplicated_read_length.stats

My Set 1 commands dependably finish in less than a day. About half of the Set 2 datasets are killed on my cluster after they hit a RAM occupancy above 355Gb.

Do you have any suggestions or things I could look in to to get this running well on all my samples?

thanks Richard

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

IanSudberycommented, Jun 6, 2019

See https://umi-tools.readthedocs.io/en/latest/faq.html for advice on speeding up/memory usage.

The running time/memory is far more dependent on the length of the UMI and the level of duplication than it is on the total number of reads.

The biggest thing you can do here to improve things is not generate the stats. The stats generation is by far the biggest time and space hog when used as it randomly samples reads from the file to compute a null distribution.

0reactions

IanSudberycommented, Nov 4, 2019

Hi Richard,

I hope you eventually managed to find a satifactory way through this.

We are currently in the process of applying for funding to make a real change in the efficiency of UMI-tools. If you are still interested in the tool, I wondered if you might be able to support the application by writing a letter saying how useful it would be for you if UMI-tools went fast/didn’t use as much memory?