question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UmiAwareMarkDuplicatesWithMateCigar random error

See original GitHub issue

Bug Report

Affected tool(s)

UmiAwareMarkDuplicatesWithMateCigar

Affected version(s)

  • picard/2.18.16
  • picard/2.27.7

Description

When running the UmiAwareMarkDuplicatesWithMateCigar tool, it sometimes produces the following error.

java -jar picard.java UmiAwareMarkDuplicatesWithMateCigar I=input.bam CREATE_INDEX=true UMI_METRICS=md_metrics M=output.txt OUTPUT=test.bam ASSUME_SORT_ORDER=coordinate TAG_DUPLICATE_SET_MEMBERS=true MAX_RECORDS_IN_RAM=400000

Exception in thread "main" htsjdk.samtools.SAMException: The input records were not sorted in duplicate order:
MN00975:67:000H2WVYG:1:21110:2374:3570  147     chr3    128485830       60      151M    =       128485833       -148       AGTCGCCGGCACTTAGGAGGGGTAGGTGGGGATGGGGTGGTGTGTAGCAGGCTGGGTGCCCATAGTAGCTAGGCCTGGGCGCAGGGGACTGCCACTTTCCATCTTCATGCTCTCCGTCAGTGACACCTGGTACTTGACGCCGTCCTTGTCC    //FF/A/FF6/=////F/A6=F//FFFFFF=FFFA/=/FFF/F/FFF/FFF//FFFFAFFFFFFFFFF/FFFF/AAFAF/AFFFF/AAFFF//FFF6FFF6F6FFF/FFFFFF//F/FA/FFAAF//F/FFFFFFFF/FFA/6F6AAF//A    MC:Z:13S130M    MD:Z:2C9A138       RG:Z:000H2WVYG.AE4441.L001      NM:i:2  MQ:i:60 AS:i:143        XS:i:0  QX:Z:FFFFFFFF   RX:Z:ACCCTATA
MN00975:67:000H2WVYG:1:11107:15346:8737 163     chr1    153346  0       10S40M21S       =       153346  40      AGCACCATCACCACTTACCTTGTCCTGTGCATCTCTTTCATTGGCTGTTCACTCCTGGCGGTTATCGGTAA    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFAF    MC:Z:10S40M13S  MD:Z:40 RG:Z:000H2WVYG.AE4441.L001      NM:i:0  MQ:i:0  AS:i:40 XS:i:40    QX:Z:FFFFFFFA   RX:Z:ATCGGTAA

        at htsjdk.samtools.DuplicateSetIterator.next(DuplicateSetIterator.java:152)
        at picard.sam.markduplicates.UmiAwareDuplicateSetIterator.next(UmiAwareDuplicateSetIterator.java:119)
        at picard.sam.markduplicates.UmiAwareDuplicateSetIterator.next(UmiAwareDuplicateSetIterator.java:53)
        at picard.sam.markduplicates.SimpleMarkDuplicatesWithMateCigar.doWork(SimpleMarkDuplicatesWithMateCigar.java:126)
        at picard.sam.markduplicates.UmiAwareMarkDuplicatesWithMateCigar.doWork(UmiAwareMarkDuplicatesWithMateCigar.java:138)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Steps to reproduce

This error does not appear only on this file nor it happens every time I run the command, it happens from time to times. I ran a lot of tests and it generally happens every 10 tests iteration of the tool.

Expected behavior

A bam file with marked umi-duplicated reads

Actual behavior

An empty bam file and a java exception.

Thank you all for your help and for the great work you’ve been doing developing Picard.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
flehartycommented, Oct 14, 2020

Per @yfarjoun 's comment on another thread.

It seems that this bug is related to multithreading race condition in samtools/htsjdk#1516. Now that this has been identified, I think a fix will be forthcoming.

0reactions
sybroheecommented, Nov 5, 2020

Great… many thanks

Read more comments on GitHub >

github_iconTop Results From Across the Web

UmiAwareMarkDuplicatesWithM...
Identifies duplicate reads using information from read positions and UMIs. This tool locates and tags duplicate reads in a BAM or SAM file, ......
Read more >
Legacy GATK Forum - Google Sites
Hello,. I've annotated a SAM file with UMI information (using fgbio's AnnotateBamWithUmis) and am trying to mark duplicates with the tool “ ...
Read more >
Picard - GitHub Pages
This tool collects metrics quantifying the error rate resulting from oxidative ... This tool applies a random downsampling algorithm to a SAM or...
Read more >
Systematic and Random Error - YouTube
Comparison of systematic and random error. Types of systematic error, including offset error and scale factor error/
Read more >
2022.07.27.501795.full.pdf - bioRxiv
A strong advantage of STARR-seq is its ability to screen random fragments of DNA from any. 168 source for enhancer activity.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found