question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to get the FASTA phased genomes from the summarized VCF file?

See original GitHub issue

Hi,

I would like to get the FASTA phased genome files. For that, from a vcf file, I read that the tool gatk FastaAlternateReferenceMaker works , but the summarized vcf file obtained with IsoPhase pipeline doesn’t match.

I mean, I get that main message error when I run gatk FastaAlternateReferenceMaker : The FORMAT field was provided but there is no genotype/sample data, for input source: my_file.vcf because the summarized vcf file looks like that :

##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
Super-Scaffold_100001   672149  .       C       T       .       PASS    DP=210;AF=0.51  GT:HQ
Super-Scaffold_100001   862122  .       A       T       .       PASS    DP=305;AF=0.5   GT:HQ
Super-Scaffold_100001   931168  .       C       A       .       PASS    DP=127;AF=0.5   GT:HQ
Super-Scaffold_100001   967240  .       C       T       .       PASS    DP=127;AF=0.5   GT:HQ

There is the FORMAT column but there is no associated genotype in my vcf. Any idea ?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Magdollcommented, Nov 12, 2020

Hi @pailloufat-stack , Cupcake v15.0.0 now should have a new ‘SAMPLE’ column when you call collect_all_vcf.py, see details at the wiki

Let me know if this works for you! -Liz

0reactions
Magdollcommented, Dec 7, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Generate new genome given vcf file - HemTools
Generate new genome sequence and BWA (v0.7.17a) index and black_list.bed given a vcf file. The tool is allele seq (see option 3 in...
Read more >
Getting haplotype specific FASTA files from VCF files - Biostars
The Haplotype can be specified using the -H parameter (as 1 or 2). The resulting FASTA file can be then used to extract...
Read more >
Chapter 17 Bioinformatic file formats
In a sentence, the most important are: FASTA, FASTQ, SAM, BAM, and VCF. Plan: Go ... All of these scaffolds go into the...
Read more >
FastaAlternateReferenceMaker - GATK
Create an alternative reference by combining a fasta with a vcf. ... The reference, requested intervals, and any number of variant files.
Read more >
TCGA VCF 1.1v2 - GDC Docs - National Cancer Institute
Variant Call Format (VCF) is a format for storing and reporting genomic ... summarizes TCGA-specific customizations that have been added to the VCF...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found