question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BALSAMIC's merged somatic mutation VCF header and to discuss info/format tags

See original GitHub issue

Hi!

BALSAMIC’s merged SNV and small indel VCF is finalized and it will be the last piece of release 3.0.0. Before releasing, I thought I should give heads up on annotations and tags. Also to get a feedback on it

The header and four example variants after VEP annotation is pasted below (I removed VEP annotation from variants to make lines shorter, but you get the idea). Also samples are named NOMRAL and TUMOR to make it more clear.

##fileformat=VCFv4.1
##fileDate=20190807
##source=VCFmerge
##source_version=0.0.1
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=AD,Number=R,Type=Float,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele fraction of the event">
##FORMAT=<ID=DP,Number=1,Type=Float,Description="Read depth in the sample">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Combined depth across samples recalculated from reported AD">
##INFO=<ID=VARCALLER,Number=.,Type=String,Description="Variant caller called this variant separated by comma">
##INFO=<ID=VARCALLER_FILTER,Number=.,Type=String,Description="Variant caller filters assigned to this variant separated by comma">
##INFO=<ID=VARCALLER_DP,Number=.,Type=String,Description="Variant caller depth assigned to this variant separated by comma">
##INFO=<ID=VARCALLER_COUNT,Number=1,Type=Integer,Description="Number of variant callers called this variant">
##INFO=<ID=VARCALLER_QUAL,Number=.,Type=String,Description="Variant quality assigned to this variant by variant callers separated by comma">
##INFO=<ID=VARCALLER_NORMAL_GT,Number=.,Type=String,Description="Genotype for NORMAL sample assigned by variant callers.">
##INFO=<ID=VARCALLER_TUMOR_GT,Number=.,Type=String,Description="Genotype for TUMOR sample assigned by variant callers.">
##INFO=<ID=TYPE,Number=1,Type=String,Description="Variant type assigned by bcftools 1.9. snp, mnp, indel, other ">
INFO=<ID=GC,Number=1,Type=Float,Description="GC content around the variant">
INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=3>
##contig=<ID=4>
##contig=<ID=5>
##contig=<ID=6>
##contig=<ID=7>
##contig=<ID=8>
##contig=<ID=9>
##contig=<ID=10>
##contig=<ID=11>
##contig=<ID=12>
##contig=<ID=13>
##contig=<ID=14>
##contig=<ID=15>
##contig=<ID=16>
##contig=<ID=17>
##contig=<ID=18>
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=21>
##contig=<ID=22>
##contig=<ID=Y>
##contig=<ID=X>
##VEP="v94" time="2019-08-06 11:01:13" cache="vep_cache" ensembl-io=94.8d53275 ensembl=94.5c08d90 ensembl-funcgen=94.08b0c13 ensembl-variation=94.066b102 1000genomes="phase3" COSMIC="81" ClinVar="201706" ESP="20141103" HGMD-PUBLIC="20164" assembly="GRCh37.p13" dbSNP="150" gencode="GENCODE 19" genebuild="2011-04" gnomAD="170228" polyphen="2.2.2" refseq="01_2015" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GIVEN_REF|USED_REF|BAM_EDIT|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NORMAL	TUMOR
1	36932852	.	C	T	.	PASS	TYPE=SNP;VARCALLER=mutect2;DP=2932;VARCALLER_FILTER=PASS;VARCALLER_DP=2932;VARCALLER_QUAL=.;VARCALLER_COUNT=1;VARCALLER_NORMAL_GT=0/0;VARCALLER_TUMOR_GT=0/1	DP:AD:AF:GT	1384:1376,8:1392:./.	1548:1538,10:1558:./.
1	215848824	.	C	T	377	PASS	TYPE=SNP;VARCALLER=mutect2,vardict,strelka;DP=5589;VARCALLER_FILTER=clustered_events|multi_event_alt_allele_in_normal,PASS,PASS;VARCALLER_DP=2081,5589,5549;VARCALLER_QUAL=.,377,.;VARCALLER_COUNT=3;VARCALLER_NORMAL_GT=0/0,0/0,.;VARCALLER_TUMOR_GT=0/1,0/1,.	DP:AD:AF:GT	2542:2541,1:2543:./.	3047:1664,1383:4430:./.
1	216371793	.	A	G	373	PASS	TYPE=SNP;VARCALLER=vardict,strelka;DP=4784;VARCALLER_FILTER=PASS,LowEVS;VARCALLER_DP=4784,4746;VARCALLER_QUAL=373,.;VARCALLER_COUNT=2;VARCALLER_NORMAL_GT=0/1,.;VARCALLER_TUMOR_GT=0/1,.	DP:AD:AF:GT	2188:1127,1061:3249:./.	2596:1418,1178:3774:./.
2	145156768	.	G	C	192	.	TYPE=SNP;VARCALLER=mutect2,vardict,strelka;DP=5546;VARCALLER_FILTER=clustered_events|multi_event_alt_allele_in_normal,p8|P0.01Likely,LowEVS;VARCALLER_DP=1837,5546,5522;VARCALLER_QUAL=.,192,.;VARCALLER_COUNT=3;VARCALLER_NORMAL_GT=0/0,0/1,.;VARCALLER_TUMOR_GT=0/1,0/1,.	DP:AD:AF:GT	2531:2512,19:2550:./.	3015:2977,38:3053:./.

This is an output of a package I am working on to merge VCF for SN{P/V} and small INDELs, called: VCFmerge. It has bunch of models and stat crunching using multiple bioinfo tools and python packages to prepare a final VCF file from multiple variant callers (it is essentially a wrapper). VCFmerge it is supporting any standard VCF from any variant callers and Strelka.

All the INFO and FORMAT tags are computed from input BAM files, and anything caller specific is removed (all the MQ and GC is also recalculated). Some of the lost info are kept withinINFO/VARIANTCALLER* tags, which I think can be used to display on Scout on variant level.

What are your thoughts on this? The new release for BALSAMIC is way past its due, so I appreciate quick comments.

PS: these are dummy data, and these variant are not real somatic mutations.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
hassanfacommented, Sep 11, 2020

Of course. I know where the issue is, when goes into prod, I’ll reopen 😃

0reactions
hassanfacommented, Sep 11, 2020

FYI, vcfmerge was a python package I was working on ensemble calling of somatic mutations based on posterior calls. But it was very hard to speed up…

Read more comments on GitHub >

github_iconTop Results From Across the Web

File Format: VCF - GDC Docs - National Cancer Institute
VCF files report the somatic variants that were detected by each of the four variant callers. Four raw VCFs (Data Type: Raw Simple...
Read more >
Hassan Foroughi Asl - BALSAMIC documentation
BALSAMIC 11.0.1: Bioinformatic Analysis pipeLine for SomAtic MutatIons in Cancer ... applied by the Vardict variant-caller is listed in the VCF header.
Read more >
Vcflib and tools for processing the VCF variant call format
The VCF file format is used in population studies as well as somatic mutation and germline mutation studies. In this paper we discuss...
Read more >
Jacquard Documentation - Read the Docs
status, and genotype tags from several somatic variant callers: ... jacquard summarize examples/02-merged.vcf <output_vcf_file> ... VCF Column header.
Read more >
maftools : Summarize, Analyze and Visualize MAF Files
With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found