question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Upstream deletions and CollectVariantCallingMetrics do not play nice right now.

See original GitHub issue

The current VCF spec allows for a * allele (no brackets):

“The ‘*’ allele is reserved to indicate that the allele is missing due to a upstream deletion.”

CollectVariantCallingMetrics treats this as a third (size 1!!) allele so that in the case of

1   10347   .   TAAACCCTA   T   100 .   AC=2    GT  0/1 0/1
1   10350   .   A           C,* 100 .   AC=3    GT  1/2 0/2

both the 0/2 and 1/2 genotypes in the second line are counted towards TOTAL_MULTIALLELIC_SNPS (for the detailed metrics) Also, both of these genotype will not be counted towards the TOTAL_SNPS (as that only captures bi-alleleic SNPs). So upstream deletions are “hurting” both the monomorphic samples (as they get an inflated TOTAL_MULTIALLELIC_SNPS ) and the polymorphic samples (as they get a deflated TOTAL_SNPS count)

I propose changing this behavior so that an upstream deletion will count as the reference allele for the purpose of metrics.

I will also add a few column or two to capture the number of upstream deletions, perhaps counting the 0/2 separately from the 1/2 genotypes.

Does this sounds reasonable to folks?

@eitanbanks @tfenne ?

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
yfarjouncommented, Jan 19, 2017

still a thing! on my “todo” list too!

On Wed, Jan 18, 2017 at 8:23 PM, Geraldine Van der Auwera < notifications@github.com> wrote:

Is this still a thing or has the relevant work been done/closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/picard/issues/555#issuecomment-273654840, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0g_H44yzYQaMI0WOnT6KtDNzXTEFks5rTrsXgaJpZM4ItK4k .

0reactions
yfarjouncommented, Feb 14, 2017

This needs to be put on hold until I modify VariantContext in HtsJdk…

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results of CollectvariantCallingMetrics - GATK
You can try to solve this error first by validating the dbSNP file with the ValidateVariants command, then, if it is VCF format,...
Read more >
(howto) Evaluate a callset with CollectVariantCallingMetrics
Our samples have an indel ratio of ~0.95, indicating that these variants are not likely to have a bias affecting their insertion/deletion ratio....
Read more >
Errors about contigs in BAM or VCF files not being properly ...
This is not as common as the "wrong reference build" problem, but it still pops up every now and then: a collaborator gives...
Read more >
Documentation: MultiQC
MultiQC doesn't run other tools for you - it's designed to be placed at the ... As such, sample names should now be...
Read more >
Debian Med Biology packages
Boxshade is a program for creating good looking printouts from multiple-aligned protein or DNA sequences. The program does not perform the alignment by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found