RevertSam, when REMOVE_ALIGNMENT_INFORMATION=true, should not stall on sam validation errors
See original GitHub issueE.g. Read CIGAR M operator maps off end of reference
. The point of RevertSam afterall is to remove subpar alignment information to eventually obtain a fresh alignment.
Bug Report
Affected tool(s)
RevertSam, when REMOVE_ALIGNMENT_INFORMATION=true
Affected version(s)
- Latest public release version [2.9.4]
Description
- Original user post: http://gatkforums.broadinstitute.org/gatk/discussion/comment/39949#Comment_39949
- Recapitulate user error with test data: https://github.com/broadinstitute/dsde-docs/issues/2231
Steps to reproduce
- Test data:
/humgen/gsa-scr1/pub/incoming/jfiksel_revertsam_bug.zip
- Test command:
java -jar $PICARD RevertSam \
I=PGDX8157T_Ex_snippet.bam \
O=sandbox/PGDX8157T_Ex_u.bam
Expected behavior
Tool reverts reads to unaligned BAM
Actual behavior
Error message:
WMCF9-CB5:jfiksel_revertsam_error shlee$ java -jar $PICARD RevertSam I=PGDX8157T_Ex_snippet.bam O=sandbox/PGDX8157T_Ex_u.bam
[Fri Jun 30 11:17:58 EDT 2017] picard.sam.RevertSam INPUT=PGDX8157T_Ex_snippet.bam OUTPUT=sandbox/PGDX8157T_Ex_u.bam OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS] SANITIZE=false MAX_DISCARD_FRACTION=0.01 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Jun 30 11:17:58 EDT 2017] Executing as shlee@WMCF9-CB5 on Mac OS X 10.11.6 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14; Picard version: 2.9.4-SNAPSHOT
INFO 2017-06-30 11:18:04 RevertSam Reverted 1,000,000 records. Elapsed time: 00:00:05s. Time for last 1,000,000: 5s. Last read position: chr17:61,684,897
INFO 2017-06-30 11:18:10 RevertSam Reverted 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 6s. Last read position: chr17:72,477,335
INFO 2017-06-30 11:18:16 RevertSam Reverted 3,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 5s. Last read position: chr17:76,120,913
INFO 2017-06-30 11:18:22 RevertSam Reverted 4,000,000 records. Elapsed time: 00:00:24s. Time for last 1,000,000: 6s. Last read position: chr17:80,046,790
[Fri Jun 30 11:18:24 EDT 2017] picard.sam.RevertSam done. Elapsed time: 0.44 minutes.
Runtime.totalMemory()=1670381568
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWI-D00743_115_5_2112_18001_6554_0:0:0:0:0, Read CIGAR M operator maps off end of reference
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:451)
at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:253)
at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:603)
at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1547)
at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2054)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:811)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at picard.sam.RevertSam.doWork(RevertSam.java:246)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Issue Analytics
- State:
- Created 6 years ago
- Comments:22 (18 by maintainers)
Top Results From Across the Web
RevertSam (Picard) – GATK - Broad Institute
Reverts SAM or BAM files to a previous state. This tool removes or restores certain properties of the SAM records, including alignment ...
Read more >Errors in SAMBAM files can be diagnosed with ValidateSamFile
The most frequent cause of these unexplained problems is not a bug in the program -- it's an invalid or malformed SAM/BAM file....
Read more >How to fix the BAM file (SAM validation error) - Biostars
I know I can set VALIDATION_STRINGENCY=LENIENTE, but I want to fix my BAM file because it is another program that is calling it...
Read more >SAM validation errors - SEQanswers
ERROR : Record 38, Read name V15-13:3:73:1694:1074, CIGAR should have zero ... strand flag does not match read negative strand flag of mate...
Read more >Thread: [Samtools-help] FixMate problem - SourceForge
INFO 2012-07-03 00:12:29 SamFileValidator 20000000 reads validated. ... You should not get those error messages from the 2nd invocation of ValidateSamFile.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jfiksel The fix has been merged to master.
Closing since #856 and #858 are merged.