MarkDuplicate crash
See original GitHub issueBug Report
Affected tool(s)
MarkDuplicate
Affected version(s)
- [x ] Latest public release version [2.18.1]
- Latest development/master branch as of [date of test?]
Description
This seem to be a common error, already reported several years ago here https://www.biostars.org/p/60263/ and here #72 But since this still crashes and it is not clear to me how to solve this, I am reporting it again. The input bam file comes from bwa mem, then converted to bam and sorted with samtools, so it should be valid.
-bash-4.1$ java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true
16:41:32.707 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/symbiose/grizk/bin/picard.2.18.1.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Mar 30 16:41:32 CEST 2018] MarkDuplicates INPUT=[/WORKS/grizk/vc/raw_ch17.aln.sorted.bam] OUTPUT=/WORKS/grizk/vc/raw_ch17.aln.sorted.dedup.bam METRICS_FILE=/WORKS/grizk/vc/raw_ch17.dedupMetrics.txt ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Mar 30 16:41:32 CEST 2018] Executing as grizk@cl1n028.genouest.org on Linux 2.6.32-431.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.1-SNAPSHOT
INFO 2018-03-30 16:41:32 MarkDuplicates Start of doWork freeMemory: 2038352320; totalMemory: 2058354688; maxMemory: 28631367680
INFO 2018-03-30 16:41:32 MarkDuplicates Reading input file and constructing read end information.
INFO 2018-03-30 16:41:32 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
WARNING 2018-03-30 16:41:34 AbstractOpticalDuplicateFinderCommandLineProgram A field field parsed out of a read name was expected to contain an integer and did not. Read name: ERR174324.81165065. Cause: String 'ERR174324.81165065' did not start with a parsable number.
[Fri Mar 30 16:41:38 CEST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0,10 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 16: null:ERR174326.21638423
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:528)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
Steps to reproduce
Command line
java -jar -Djava.io.tmpdir=./tmp ~grizk/bin/picard.2.18.1.jar MarkDuplicates I=$aln_sorted_BAM O=$aln_sorted_dedup_BAM M=$dedupMetrics ASSUME_SORTED=true
Expected behavior
It should not crash.
Actual behavior
It crashes
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
MarkDuplicates (Picard) - GATK - Broad Institute
Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating...
Read more >[Samtools-help] JVM crash using Picard's MarkDuplicates
Hi, I'm running into problems using Picard's MarkDuplicates utility. Sometimes I get a JVM crash, sometimes the process just seems to hang.
Read more >GATK4Alpha MarkDuplicates crashing - Google Sites
I'm trying to get GATK4-Alpha MarkDuplicates working on the bam files generated with ... Can you provide any guidance on what is causing...
Read more >Picard MarkDuplicates fatal error - Biostars
Hi, I'm running Picard MarkDuplicates on a sorted mapped BAM file. ... please visit: # http://bugreport.java.com/bugreport/crash.jsp ...
Read more >GATK4-Alpha MarkDuplicates crashing - GATK-Forum - RSSing.com
I've used Picard2.1 MarkDuplicates for a bunch of bam files (4-40x coverage) generated with ... Can you provide any guidance on what is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Your read names (
ERR174324.81165065
) are not conformant to the expected “READ_NAME_REGEX=<optimized capture of last three ‘:’ separated fields as numeric values>” , so optical duplicates cannot be identified. You can either:READ_NAME_REGEX=null
to disable optical duplicate finding.Hi, I also have the same read name several times in my bam file. How should I solve this problem? Thanks, Chi