Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MarkDuplicate crash

See original GitHub issue

Bug Report

Affected tool(s)

MarkDuplicate

Affected version(s)

[x ] Latest public release version [2.18.1]
Latest development/master branch as of [date of test?]

Description

This seem to be a common error, already reported several years ago here https://www.biostars.org/p/60263/ and here #72 But since this still crashes and it is not clear to me how to solve this, I am reporting it again. The input bam file comes from bwa mem, then converted to bam and sorted with samtools, so it should be valid.

-bash-4.1$ java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true
16:41:32.707 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/symbiose/grizk/bin/picard.2.18.1.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Mar 30 16:41:32 CEST 2018] MarkDuplicates INPUT=[/WORKS/grizk/vc/raw_ch17.aln.sorted.bam] OUTPUT=/WORKS/grizk/vc/raw_ch17.aln.sorted.dedup.bam METRICS_FILE=/WORKS/grizk/vc/raw_ch17.dedupMetrics.txt ASSUME_SORTED=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Mar 30 16:41:32 CEST 2018] Executing as grizk@cl1n028.genouest.org on Linux 2.6.32-431.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.1-SNAPSHOT
INFO	2018-03-30 16:41:32	MarkDuplicates	Start of doWork freeMemory: 2038352320; totalMemory: 2058354688; maxMemory: 28631367680
INFO	2018-03-30 16:41:32	MarkDuplicates	Reading input file and constructing read end information.
INFO	2018-03-30 16:41:32	MarkDuplicates	Will retain up to 103736839 data points before spilling to disk.
WARNING	2018-03-30 16:41:34	AbstractOpticalDuplicateFinderCommandLineProgram	A field field parsed out of a read name was expected to contain an integer and did not. Read name: ERR174324.81165065. Cause: String 'ERR174324.81165065' did not start with a parsable number.

[Fri Mar 30 16:41:38 CEST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0,10 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once.  16: null:ERR174326.21638423
	at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
	at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
	at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
	at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:528)
	at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Steps to reproduce

Command line java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true

Expected behavior

It should not crash.

Actual behavior

It crashes

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

yfarjouncommented, Mar 30, 2018

Your read names (ERR174324.81165065) are not conformant to the expected “READ_NAME_REGEX=<optimized capture of last three ‘:’ separated fields as numeric values>” , so optical duplicates cannot be identified. You can either: