Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing deduplicated outputbam

See original GitHub issue

Hello UMI-tools developers,

First of all thank you for developing such an amazing tool. I am currently running the umi_tools dedup to a single end data. Here I have first line of my BAM with the ACCCTAGTGGT UMI: CL100138864L1C004R011_126688_ACCCTAGTGGT 16 1 3003754 255 20M * 0 0 AATCCACATGCTCCGCCGCT DDCDEEDEEEECBDDDDDDD XA:i:2 MD:Z:13A3T2 NM:i:2 XM:i:2 It has roughly 30 million reads. I ran the tool as follow:

umi_tools dedup \
        --stdin=${input_bam} \
        --stdout=${output_bam} \
        --log=${logfile} \
        --verbose=10

In my log file, I see the following:

2020-04-22 11:53:56,697 INFO Written out 100000 reads
2020-04-22 11:55:59,982 INFO Written out 200000 reads
2020-04-22 11:56:11,232 INFO Written out 300000 reads
2020-04-22 13:35:17,819 INFO Written out 400000 reads
2020-04-22 13:35:35,167 INFO Written out 500000 reads
2020-04-22 13:35:51,753 INFO Written out 600000 reads
2020-04-22 13:35:52,221 INFO Parsed 1000000 input reads
2020-04-22 14:50:11,438 INFO Written out 700000 reads
2020-04-22 14:51:53,432 INFO Written out 800000 reads
2020-04-22 14:52:02,158 INFO Written out 900000 reads
2020-04-22 15:13:05,851 INFO Written out 1000000 reads
2020-04-22 15:13:07,990 INFO Written out 1100000 reads
2020-04-22 15:13:44,472 INFO Parsed 2000000 input reads
2020-04-22 15:51:32,368 INFO Written out 1200000 reads
2020-04-22 15:51:46,719 INFO Written out 1300000 reads

As of know, I’m already using 180G memory. Still, I wonder why the output BAM remained empty? What am I doing wrong?

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

IanSudberycommented, Apr 22, 2020

UMI-tools first outputs the BAM file to a temporary file, and then sorts it before outputting to the final destination. From the memory usage, it sounds like in your case the temporary file may be being stored in the memory, rather than on disk, which can happen depending on the system configuration. You can change the tmpdir location with --tmp-dir= or you can disable writing to a temporary directory and then sorting with --no-sort-output which will write the unsorted output directly to the output file.

Ian

0reactions

cagasercommented, Apr 30, 2020

I just checked the the number of reads from the last position where the tool failed; there are about ~4600 reads in this position. Still, was the allocated 180G not enough?