question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MarkDuplicate crash

See original GitHub issue

Bug Report

Affected tool(s)

MarkDuplicate

Affected version(s)

  • [x ] Latest public release version [2.18.1]
  • Latest development/master branch as of [date of test?]

Description

This seem to be a common error, already reported several years ago here https://www.biostars.org/p/60263/ and here #72 But since this still crashes and it is not clear to me how to solve this, I am reporting it again. The input bam file comes from bwa mem, then converted to bam and sorted with samtools, so it should be valid.

-bash-4.1$ java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true
16:41:32.707 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/symbiose/grizk/bin/picard.2.18.1.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Mar 30 16:41:32 CEST 2018] MarkDuplicates INPUT=[/WORKS/grizk/vc/raw_ch17.aln.sorted.bam] OUTPUT=/WORKS/grizk/vc/raw_ch17.aln.sorted.dedup.bam METRICS_FILE=/WORKS/grizk/vc/raw_ch17.dedupMetrics.txt ASSUME_SORTED=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Mar 30 16:41:32 CEST 2018] Executing as grizk@cl1n028.genouest.org on Linux 2.6.32-431.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.1-SNAPSHOT
INFO	2018-03-30 16:41:32	MarkDuplicates	Start of doWork freeMemory: 2038352320; totalMemory: 2058354688; maxMemory: 28631367680
INFO	2018-03-30 16:41:32	MarkDuplicates	Reading input file and constructing read end information.
INFO	2018-03-30 16:41:32	MarkDuplicates	Will retain up to 103736839 data points before spilling to disk.
WARNING	2018-03-30 16:41:34	AbstractOpticalDuplicateFinderCommandLineProgram	A field field parsed out of a read name was expected to contain an integer and did not. Read name: ERR174324.81165065. Cause: String 'ERR174324.81165065' did not start with a parsable number.

[Fri Mar 30 16:41:38 CEST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0,10 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once.  16: null:ERR174326.21638423
	at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
	at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
	at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
	at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:528)
	at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Steps to reproduce

Command line java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true

Expected behavior

It should not crash.

Actual behavior

It crashes


Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
yfarjouncommented, Mar 30, 2018

Your read names (ERR174324.81165065) are not conformant to the expected “READ_NAME_REGEX=<optimized capture of last three ‘:’ separated fields as numeric values>” , so optical duplicates cannot be identified. You can either:

  1. fix your read names so that they are conformant, or
  2. run with READ_NAME_REGEX=null to disable optical duplicate finding.
0reactions
chzh1418commented, Oct 27, 2018

great!

it’s a bit difficult as that message is comming from htsjdk…but you are right that the message is rather cryptic.

Hi, I also have the same read name several times in my bam file. How should I solve this problem? Thanks, Chi

Read more comments on GitHub >

github_iconTop Results From Across the Web

MarkDuplicates (Picard) - GATK - Broad Institute
Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating...
Read more >
[Samtools-help] JVM crash using Picard's MarkDuplicates
Hi, I'm running into problems using Picard's MarkDuplicates utility. Sometimes I get a JVM crash, sometimes the process just seems to hang.
Read more >
GATK4Alpha MarkDuplicates crashing - Google Sites
I'm trying to get GATK4-Alpha MarkDuplicates working on the bam files generated with ... Can you provide any guidance on what is causing...
Read more >
Picard MarkDuplicates fatal error - Biostars
Hi, I'm running Picard MarkDuplicates on a sorted mapped BAM file. ... please visit: # http://bugreport.java.com/bugreport/crash.jsp ...
Read more >
GATK4-Alpha MarkDuplicates crashing - GATK-Forum - RSSing.com
I've used Picard2.1 MarkDuplicates for a bunch of bam files (4-40x coverage) generated with ... Can you provide any guidance on what is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found