Missing deduplicated outputbam
See original GitHub issueHello UMI-tools developers,
First of all thank you for developing such an amazing tool.
I am currently running the umi_tools dedup to a single end data.
Here I have first line of my BAM with the ACCCTAGTGGT UMI:
CL100138864L1C004R011_126688_ACCCTAGTGGT 16 1 3003754 255 20M * 0 0 AATCCACATGCTCCGCCGCT DDCDEEDEEEECBDDDDDDD XA:i:2 MD:Z:13A3T2 NM:i:2 XM:i:2
It has roughly 30 million reads.
I ran the tool as follow:
umi_tools dedup \
--stdin=${input_bam} \
--stdout=${output_bam} \
--log=${logfile} \
--verbose=10
In my log file, I see the following:
2020-04-22 11:53:56,697 INFO Written out 100000 reads
2020-04-22 11:55:59,982 INFO Written out 200000 reads
2020-04-22 11:56:11,232 INFO Written out 300000 reads
2020-04-22 13:35:17,819 INFO Written out 400000 reads
2020-04-22 13:35:35,167 INFO Written out 500000 reads
2020-04-22 13:35:51,753 INFO Written out 600000 reads
2020-04-22 13:35:52,221 INFO Parsed 1000000 input reads
2020-04-22 14:50:11,438 INFO Written out 700000 reads
2020-04-22 14:51:53,432 INFO Written out 800000 reads
2020-04-22 14:52:02,158 INFO Written out 900000 reads
2020-04-22 15:13:05,851 INFO Written out 1000000 reads
2020-04-22 15:13:07,990 INFO Written out 1100000 reads
2020-04-22 15:13:44,472 INFO Parsed 2000000 input reads
2020-04-22 15:51:32,368 INFO Written out 1200000 reads
2020-04-22 15:51:46,719 INFO Written out 1300000 reads
As of know, I’m already using 180G memory. Still, I wonder why the output BAM remained empty? What am I doing wrong?
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
dedup - Deduplicate reads using UMI and mapping coordinates
BAM is paired end - output both read pairs. This will also force the use of the template length to determine reads with...
Read more >De-duplicate UMI at FASTQ level - Biostars
As after running the dedup command, it will deduplicate the reads based on UMI which will be actually inflated than the usual number...
Read more >Removing duplicates from alignment output - UT Austin Wikis
Load the output.bam file into IGV to check on areas which showed evidence of pcr duplicates before. No labels.
Read more >Connor - PyPI
A command-line tool to deduplicate bam files based on custom, inline barcoding. Build Status Test Coverage Code Climate License Latest PyPI version.
Read more >UMI-tools Documentation - Read the Docs
If that doesn't work, then you need to find what is missing. ... umi_tools dedup -I example.bam --output-stats=deduplicated -S deduplicated.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
UMI-tools first outputs the BAM file to a temporary file, and then sorts it before outputting to the final destination. From the memory usage, it sounds like in your case the temporary file may be being stored in the memory, rather than on disk, which can happen depending on the system configuration. You can change the tmpdir location with
--tmp-dir=
or you can disable writing to a temporary directory and then sorting with--no-sort-output
which will write the unsorted output directly to the output file.Ian
I just checked the the number of reads from the last position where the tool failed; there are about ~4600 reads in this position. Still, was the allocated 180G not enough?