question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

issues with complex haplotypes

See original GitHub issue

hi, I am using DV 0.9 to find de novo variants in trios and so I am enriching for weird stuff. I am sure your team is aware of some/all of these, but I’ll document at least 1 case here for the record.

Nearly all obvious false positive de novo calls follow the same pattern. Mostly there is a haplotype from mom , and a haplotype from dad that are different. The kid inherits both the variable haplotypes, but the combination of DV and glnexus gentoype such that the variant appears as de novo.

But, here is an example where there is just an incorrect call from DV that leads to 3 neighboring spurious de novo calls in the kid (top row): bad-dn

note that dad in the 3rd row has a single read with a 1-base deletion (dash) followed by an insertion (purple tick). I can’t show all of the reads in this image, but I have scrolled through and verified that is the only read.

here is the content of the dad’s VCF for that region (the mom’s is actually very similar):

chr8	75144980	.	CT	C	44.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:44:34:16,18:0.529412:44,0,53
chr8	75144983	.	T	TG	49.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:49:35:18,17:0.485714:49,0,64

note that that is the single-base del and the insertion that occurs in only 1 read. Since the mom’s is the same, maybe there’s something akin to realignment going on, but by contrast, here is the kid’s (seemingly more sensible) VCF for that region:

chr8	75144981	.	T	A	71.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:66:27:11,16:0.592593:71,0,67
chr8	75144982	.	A	T	63.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:61:27:11,16:0.592593:63,0,63
chr8	75144983	.	T	G	67.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:66:27:11,16:0.592593:67,0,71

here is the content of the gvcf for dad:

chr8	75144980	.	CT	C,<*>	44.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:44:34:16,18,0:0.529412,0:44,0,53,990,990,990
chr8	75144982	.	A	<*>	0	.	END=75144982	GT:GQ:MIN_DP:PL	0/0:48:16:0,48,479
chr8	75144983	.	T	TG,<*>	49.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:49:35:18,17,0:0.485714,0:49,0,64,990,990,990
chr8	75144984	.	G	<*>	0	.	END=75145000	GT:GQ:MIN_DP:PL	0/0:50:31:0,105,1049

and kid:

chr8	75144981	.	T	A,<*>	71.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:66:27:11,16,0:0.592593,0:71,0,67,990,990,990
chr8	75144982	.	A	T,<*>	63.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:61:27:11,16,0:0.592593,0:63,0,63,990,990,990
chr8	75144983	.	T	G,<*>	67.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:66:27:11,16,0:0.592593,0:67,0,71,990,990,990
chr8	75144984	.	G	<*>	0	.	END=75145000	GT:GQ:MIN_DP:PL	0/0:50:25:0,75,749

I am attaching a small sam for kid and dad aligned to hg38 kid.sam.gz dad.sam.gz

I have other scenarios, but this one is one that seems clearly a deep variant issue and not a problem with glnexus.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14

github_iconTop GitHub Comments

1reaction
brentpcommented, Feb 19, 2020

Hi Andrew, thanks for the reply. I hadn’t previously appreciated the extent of this problem before either–not just in DV, but everywhere. The “kid” and “dad” variants I posted above can be made to be the same variant-set if we know they are on the same haplotype–and running with the window selection off does this, but it still creates a problem with annotating across different call-sets when different representations are used. gnomad prefers the representation of an insertion and a deletion rather than the 3 individual SNPs.

So, even if this is further resolved in DV (running with ws_use_window_selector_model=false is sufficient for me), then it will be, in cases like this, impossible to correctly annotate across cohorts without phasing information.

edit: feel free to close this issue as my original issue is addressed.

1reaction
pichuancommented, Feb 17, 2020

@kokyriakidis to your question, having ws_use_window_selector_model was to improve runtime, not to improve accuracy or sensitivity. But empirically we expect the trade-off on accuracy to be small. Below you can see my comparison between turning it on/off for WGS and WES on v0.9. (On PACBIO setting, this doesn’t affect anything because we don’t run realigner for PACBIO.)


I ran some numbers based on the v0.9 WGS and WES case study data. I used this type of CPU machine for the runtime.

The following results were done in two settings: BASE: This is the default (i.e., ws_use_window_selector_model is true) EXPT: Turn off window selector (i.e., set ws_use_window_selector_model to false)

On WGS case study:

Settings make_examples runtime Indel F1 SNP F1
BASE 81m29.320s 0.998112 0.999633
EXPT 107m3.893s 0.998156 0.999642

On WES case study:

Settings make_examples runtime Indel F1 SNP F1
BASE 10m5.515s 0.973295 0.999318
EXPT 19m25.869s 0.974056 0.999362

For the detailed hap.py output, you can see here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ethical Issues in Developing a Haplotype Map with Socially ...
Ethical Issues in Developing a Haplotype Map with Socially Defined Populations. Morris W. Foster, Ph.D. Department of Anthropology
Read more >
HaploShare: identification of extended haplotypes shared by ...
Recent founder mutations may play important roles in complex diseases and Mendelian disorders. Detecting shared haplotypes that are ...
Read more >
Highly conserved extended haplotypes of the major ... - NCBI
To determine the relationship between highly-conserved extended-haplotypes (CEHs) in the major histocompatibility complex (MHC) and ...
Read more >
Systematic haplotype analysis resolves a complex plasma ...
Systematic haplotype analysis resolves a complex plasma plant sterol locus on the Micronesian ... and the minor alleles were the risk alleles in...
Read more >
Detecting rare haplotypes associated with complex diseases ...
It can be advantageous to investigate haplotype-based rather than SNP-based associations. Haplotypes usually have a larger effect size than SNPs ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found