parameters setting for somatic/germline variant calling
See original GitHub issueDear team, may I know if there is any different parameters setting between somatic variant calling and germline variant calling? The reason why I posted this question is that I found one heterozygous variant called by deepvariant but seems homozygous supported in IGV. I’m wondering if my setting of deepvariant is too loose for this variant? (ps, I just run deepvariant by default pacbio data setting)
Here is the result in VCF. The VAF is quite high, and1,20
AD means only 1 read supported the widetype read?
chr4 3079267 . G T 36.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:21:1,20:0.952381:33,0,1
Here is the screenshot of IGV
Thanks!!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
Best practices for variant calling in clinical sequencing
Population variant filtering is a powerful strategy for identifying and removing likely germline variants from somatic mutation callsets but ...
Read more >Standards and Guidelines for the Interpretation and Reporting ...
In certain settings, a germline variant may be suspected (eg, MAF 40% to 60%). ... varscan 2 for germline variant calling and somatic...
Read more >Systematic comparison of somatic variant calling performance ...
In order to evaluate the somatic mutation calling performance at different sequencing depth, we compared precision rate, recall rate and F- ...
Read more >Identification of somatic and germline variants from tumor and ...
To be able to distinguish between these two types of variants always requires a direct comparison of data from tumor and normal tissue...
Read more >TOSCA: an automated Tumor Only Somatic CAlling workflow ...
Accurate classification of somatic variants in a tumor sample is often accomplished by utilizing a paired normal tissue sample from the same ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Qianwangwoo
We don’t do additional filtering beyond the probabilities from the classifier. In this case, DeepVariant does not have a high confidence in the correct genotype between HET and HOM-ALT (a GQ of 4 corresponds to a ~60% confidence in a correct genotype call). The QUAL value of 36.1 suggests that DeepVariant is at least pretty confident that the position is not REF
A few other points to keep in mind - first, are you using the two-pass DeepVariant-WhatsHap-DeepVariant method? If so, then DeepVariant may be using additional information about the phasing from longer range.
Second, this variant is at a junction between homopolymers (poly-T and poly-G) This represents the dominant error mode for PacBio HiFi, so it may nit be straightforward for a human to assess the probability of a G->T variant here as opposed to a sequencing error of Insertion T and deletion G.
If you want to for sure have a higher precision, you can additionally filter for GQ value (e.g. 10 for a 90% confidence in the genotype call). However, if you do so, you will lose variant positions like this which are very likely not reference, but difficult to genotype.
Hi @Qianwangwoo
Yes, the two-pass method generally improves accuracy with PacBio small variant calling, especially for Indels. Whether it is likely to improve this call, I am not sure. Note that we anticipate a future release of DeepVariant for PacBio in the near future which will have comparable accuracy with a single pass of variant calling, so you may prefer to keep your current workflow and wait for that version if you don’t mind updating.