issues with complex haplotypes
See original GitHub issuehi, I am using DV 0.9 to find de novo variants in trios and so I am enriching for weird stuff. I am sure your team is aware of some/all of these, but I’ll document at least 1 case here for the record.
Nearly all obvious false positive de novo calls follow the same pattern. Mostly there is a haplotype from mom , and a haplotype from dad that are different. The kid inherits both the variable haplotypes, but the combination of DV and glnexus gentoype such that the variant appears as de novo.
But, here is an example where there is just an incorrect call from DV that leads to 3 neighboring spurious de novo calls in the kid (top row):
note that dad in the 3rd row has a single read with a 1-base deletion (dash) followed by an insertion (purple tick). I can’t show all of the reads in this image, but I have scrolled through and verified that is the only read.
here is the content of the dad’s VCF for that region (the mom’s is actually very similar):
chr8 75144980 . CT C 44.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:44:34:16,18:0.529412:44,0,53
chr8 75144983 . T TG 49.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:49:35:18,17:0.485714:49,0,64
note that that is the single-base del and the insertion that occurs in only 1 read. Since the mom’s is the same, maybe there’s something akin to realignment going on, but by contrast, here is the kid’s (seemingly more sensible) VCF for that region:
chr8 75144981 . T A 71.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:66:27:11,16:0.592593:71,0,67
chr8 75144982 . A T 63.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:61:27:11,16:0.592593:63,0,63
chr8 75144983 . T G 67.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:66:27:11,16:0.592593:67,0,71
here is the content of the gvcf for dad:
chr8 75144980 . CT C,<*> 44.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:44:34:16,18,0:0.529412,0:44,0,53,990,990,990
chr8 75144982 . A <*> 0 . END=75144982 GT:GQ:MIN_DP:PL 0/0:48:16:0,48,479
chr8 75144983 . T TG,<*> 49.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:49:35:18,17,0:0.485714,0:49,0,64,990,990,990
chr8 75144984 . G <*> 0 . END=75145000 GT:GQ:MIN_DP:PL 0/0:50:31:0,105,1049
and kid:
chr8 75144981 . T A,<*> 71.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:66:27:11,16,0:0.592593,0:71,0,67,990,990,990
chr8 75144982 . A T,<*> 63.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:61:27:11,16,0:0.592593,0:63,0,63,990,990,990
chr8 75144983 . T G,<*> 67.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:66:27:11,16,0:0.592593,0:67,0,71,990,990,990
chr8 75144984 . G <*> 0 . END=75145000 GT:GQ:MIN_DP:PL 0/0:50:25:0,75,749
I am attaching a small sam for kid and dad aligned to hg38 kid.sam.gz dad.sam.gz
I have other scenarios, but this one is one that seems clearly a deep variant issue and not a problem with glnexus.
Issue Analytics
- State:
- Created 4 years ago
- Comments:14
Top GitHub Comments
Hi Andrew, thanks for the reply. I hadn’t previously appreciated the extent of this problem before either–not just in DV, but everywhere. The “kid” and “dad” variants I posted above can be made to be the same variant-set if we know they are on the same haplotype–and running with the window selection off does this, but it still creates a problem with annotating across different call-sets when different representations are used. gnomad prefers the representation of an insertion and a deletion rather than the 3 individual SNPs.
So, even if this is further resolved in DV (running with
ws_use_window_selector_model=false
is sufficient for me), then it will be, in cases like this, impossible to correctly annotate across cohorts without phasing information.edit: feel free to close this issue as my original issue is addressed.
@kokyriakidis to your question, having ws_use_window_selector_model was to improve runtime, not to improve accuracy or sensitivity. But empirically we expect the trade-off on accuracy to be small. Below you can see my comparison between turning it on/off for WGS and WES on v0.9. (On PACBIO setting, this doesn’t affect anything because we don’t run realigner for PACBIO.)
I ran some numbers based on the v0.9 WGS and WES case study data. I used this type of CPU machine for the runtime.
The following results were done in two settings:
BASE
: This is the default (i.e.,ws_use_window_selector_model
is true)EXPT
: Turn off window selector (i.e., setws_use_window_selector_model
to false)On WGS case study:
make_examples
runtimeBASE
EXPT
On WES case study:
make_examples
runtimeBASE
EXPT
For the detailed hap.py output, you can see here.