question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing deduplicated outputbam

See original GitHub issue

Hello UMI-tools developers,

First of all thank you for developing such an amazing tool. I am currently running the umi_tools dedup to a single end data. Here I have first line of my BAM with the ACCCTAGTGGT UMI: CL100138864L1C004R011_126688_ACCCTAGTGGT 16 1 3003754 255 20M * 0 0 AATCCACATGCTCCGCCGCT DDCDEEDEEEECBDDDDDDD XA:i:2 MD:Z:13A3T2 NM:i:2 XM:i:2 It has roughly 30 million reads. I ran the tool as follow:

umi_tools dedup \
        --stdin=${input_bam} \
        --stdout=${output_bam} \
        --log=${logfile} \
        --verbose=10

In my log file, I see the following:

2020-04-22 11:53:56,697 INFO Written out 100000 reads
2020-04-22 11:55:59,982 INFO Written out 200000 reads
2020-04-22 11:56:11,232 INFO Written out 300000 reads
2020-04-22 13:35:17,819 INFO Written out 400000 reads
2020-04-22 13:35:35,167 INFO Written out 500000 reads
2020-04-22 13:35:51,753 INFO Written out 600000 reads
2020-04-22 13:35:52,221 INFO Parsed 1000000 input reads
2020-04-22 14:50:11,438 INFO Written out 700000 reads
2020-04-22 14:51:53,432 INFO Written out 800000 reads
2020-04-22 14:52:02,158 INFO Written out 900000 reads
2020-04-22 15:13:05,851 INFO Written out 1000000 reads
2020-04-22 15:13:07,990 INFO Written out 1100000 reads
2020-04-22 15:13:44,472 INFO Parsed 2000000 input reads
2020-04-22 15:51:32,368 INFO Written out 1200000 reads
2020-04-22 15:51:46,719 INFO Written out 1300000 reads

As of know, I’m already using 180G memory. Still, I wonder why the output BAM remained empty? What am I doing wrong?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
IanSudberycommented, Apr 22, 2020

UMI-tools first outputs the BAM file to a temporary file, and then sorts it before outputting to the final destination. From the memory usage, it sounds like in your case the temporary file may be being stored in the memory, rather than on disk, which can happen depending on the system configuration. You can change the tmpdir location with --tmp-dir= or you can disable writing to a temporary directory and then sorting with --no-sort-output which will write the unsorted output directly to the output file.

Ian

0reactions
cagasercommented, Apr 30, 2020

I just checked the the number of reads from the last position where the tool failed; there are about ~4600 reads in this position. Still, was the allocated 180G not enough?

Read more comments on GitHub >

github_iconTop Results From Across the Web

dedup - Deduplicate reads using UMI and mapping coordinates
BAM is paired end - output both read pairs. This will also force the use of the template length to determine reads with...
Read more >
De-duplicate UMI at FASTQ level - Biostars
As after running the dedup command, it will deduplicate the reads based on UMI which will be actually inflated than the usual number...
Read more >
Removing duplicates from alignment output - UT Austin Wikis
Load the output.bam file into IGV to check on areas which showed evidence of pcr duplicates before. No labels.
Read more >
Connor - PyPI
A command-line tool to deduplicate bam files based on custom, inline barcoding. Build Status Test Coverage Code Climate License Latest PyPI version.
Read more >
UMI-tools Documentation - Read the Docs
If that doesn't work, then you need to find what is missing. ... umi_tools dedup -I example.bam --output-stats=deduplicated -S deduplicated.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found