Strange false positive call
See original GitHub issueHello, (me again sorry), I am eyeballing bam files for evaluation of candidates. Mostly it’s fine (I open a lot of issues but I want to stress that most reported variants seem correct). But here is a very, very strange case to my eyes.
Here is the gvcf line
chromosome_1 17434065 . C T,<*> 29.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:29:64:38,26,0:0.40625,0:29,0,99,990,990,990
That looks all right to me, seems like a “solid” candidate. But now, let’s look at the bam
the relevant position is the first C starting from the right. As you see, there is not a single T base there. The site seems perfectly homozygous for C:C.
However, a bit on the left, you can see that many of the mapped reads abruptly end at the same position with a T. It’s not a variant but something a bit strange seems to happen in that region.
So far it’s the only such case I have. Do you have any idea of what’s going on?
Here are the lines from the gVCF before the site. The abrupt T position is at 17434056, so in a block with no candidate. But just right after, for a few bases, DeepVariant threw a few calls that are completely not supported by the mapping.
zgrep -w -C8 "17434065" H4A4.g.vcf.gz
chromosome_1 17434049 . T G,<*> 29.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:29:62:36,25,0:0.403226,0:29,0,99,990,990,990
chromosome_1 17434050 . G <*> 0 . END=17434056 GT:GQ:MIN_DP:PL 0/0:50:59:0,186,1859
chromosome_1 17434057 . G T,<*> 25.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:26:59:33,26,0:0.440678,0:25,0,71,990,990,990
chromosome_1 17434058 . T <*> 0 . END=17434058 GT:GQ:MIN_DP:PL 0/0:50:63:0,189,1889
chromosome_1 17434059 . G T,<*> 25 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:63:37,26,0:0.412698,0:25,0,77,990,990,990
chromosome_1 17434060 . A <*> 0 . END=17434062 GT:GQ:MIN_DP:PL 0/0:50:64:0,165,1889
chromosome_1 17434063 . G C,<*> 26.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:26:64:38,25,0:0.390625,0:26,0,99,990,990,990
chromosome_1 17434064 . A <*> 0 . END=17434064 GT:GQ:MIN_DP:PL 0/0:50:64:0,192,1919
chromosome_1 17434065 . C T,<*> 29.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:29:64:38,26,0:0.40625,0:29,0,99,990,990,990
any idea of what’s going on there? It’s a bit annoying in the sense that I don’t know how I could have caught that without eyeballing the alignment. In my experiment I plan to eyeball all my candidates anyway (because there are less than 100 of them).
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (1 by maintainers)
Top GitHub Comments
Hello, how are we supposed to pass those arguments with the one-step wrapper script?
I tried
but runs into
EDIT: ah ok, the flag must explicitly be set to “=true” ^^’
Hi @aderzelle If the question is why a variant is getting called when you see no sign of it in the original bam, then that can only be answered by inspecting the realigned bam, not by additional sequencing. It seems likely given the split reads that you are seeing the signal of some kind of misassembly or structural variant (if the genomes producing the bam and the reference are not the same). If seeing the realigned bam does not answer your question, then you can send the files to marianattestad@google.com and I can take a look. I would need to see the original bam, the realigned bam, the reference genome, and the position of the variant you are asking about.