Can't get children from NCBI GFF (may be user error)
See original GitHub issueHi- I’ve been trying to use gffutils to parse an NCBI GFF and ran into an issue getting the children. I’m testing with a small gff of three genes pulled from the original gff. Things started out OK:
if not test_db:
db=gffutils.create_db(test, test_db, id_spec={'gene': 'db_xref'})
db=gffutils.FeatureDB(test_db)
for i in db.featuretypes():
print("Feature: %s: %d" % (i, db.count_features_of_type(i)))
Feature: CDS: 38
Feature: exon: 41
Feature: gene: 3
Feature: mRNA: 6
And, i can get genes
for g in db.features_of_type('gene'):
print(g)
chr1 BestRefSeq%2CGnomon gene 11686635 11725857 . + . ID=gene-DRAXIN;Dbxref=GeneID:374946,HGNC:HGNC:25054,MIM:612682;Name=DRAXIN;description=dorsal inhibitory axon guidance protein;gbkey=Gene;gene=DRAXIN;gene_biotype=protein_coding;gene_synonym=AGPA3119,C1orf187,neucrin,UNQ3119
chr1 BestRefSeq gene 15617458 15669044 . + . ID=gene-DDI2;Dbxref=GeneID:84301,HGNC:HGNC:24578;Name=DDI2;description=DNA damage inducible 1 homolog 2;gbkey=Gene;gene=DDI2;gene_biotype=protein_coding
chr1 BestRefSeq gene 19920009 19923617 . - . ID=gene-PLA2G2E;Dbxref=GeneID:30814,HGNC:HGNC:13414,MIM:618320;Name=PLA2G2E;description=phospholipase A2 group IIE;gbkey=Gene;gene=PLA2G2E;gene_biotype=protein_coding;gene_synonym=GIIE sPLA2,sPLA2-IIE
and a specific gene
gene=db['gene_1']
print(gene)
chr1 BestRefSeq%2CGnomon gene 11686635 11725857 . + . ID=gene-DRAXIN;Dbxref=GeneID:374946,HGNC:HGNC:25054,MIM:612682;Name=DRAXIN;description=dorsal inhibitory axon guidance protein;gbkey=Gene;gene=DRAXIN;gene_biotype=protein_coding;gene_synonym=AGPA3119,C1orf187,neucrin,UNQ3119
but no children:
for i in db.children(gene):
print(i)
Is this just user error or something with the NCBI GFF?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:13 (1 by maintainers)
Top Results From Across the Web
Annotating Genomes with GFF3 or GTF files - NCBI - NIH
Annotating Genomes with GFF3 or GTF files. This page describes how to create an annoated genome submission from GFF3 or GTF files, ...
Read more >Import - NCBI - NIH
Importing a GFF3 file This box will compare the sequence IDS in the GFF table in column 1 to the sequence IDS in...
Read more >Validation Error Explanations for Genomes - NCBI - NIH
Explanation : An author name has illegal characters. Suggestion : Check the first names (given names) in the sequence and publication references ...
Read more >Gene Frequently Asked Questions - NCBI Bookshelf
How can I obtain the genomic sequence for a gene? ... Why can I sometimes display a record, but then cannot retrieve it...
Read more >Frequently Asked Questions for Genomes - NCBI - NIH
Does NCBI have an annotation pipeline that can be used to annotate ... For example, we cannot accept a single sequence of all...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I never met a gtf where it did not work but I don’t know… I am using gtf files from ensembl.
No all these features can be deduced from exons. In each exon feature you have the transcript_id … if you ask gffutils to build the db it will reconstruct a transcript where the start is the start of the first exon and the end the end of the last exon. You are not loosing any information… You can try and confirm.