question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't get children from NCBI GFF (may be user error)

See original GitHub issue

Hi- I’ve been trying to use gffutils to parse an NCBI GFF and ran into an issue getting the children. I’m testing with a small gff of three genes pulled from the original gff. Things started out OK:

if not test_db:
    db=gffutils.create_db(test, test_db, id_spec={'gene': 'db_xref'})
db=gffutils.FeatureDB(test_db)
for i in db.featuretypes():
    print("Feature: %s: %d" % (i, db.count_features_of_type(i)))

Feature: CDS: 38
Feature: exon: 41
Feature: gene: 3
Feature: mRNA: 6

And, i can get genes

for g in db.features_of_type('gene'):
    print(g)

chr1	BestRefSeq%2CGnomon	gene	11686635	11725857	.	+	.	ID=gene-DRAXIN;Dbxref=GeneID:374946,HGNC:HGNC:25054,MIM:612682;Name=DRAXIN;description=dorsal inhibitory axon guidance protein;gbkey=Gene;gene=DRAXIN;gene_biotype=protein_coding;gene_synonym=AGPA3119,C1orf187,neucrin,UNQ3119
chr1	BestRefSeq	gene	15617458	15669044	.	+	.	ID=gene-DDI2;Dbxref=GeneID:84301,HGNC:HGNC:24578;Name=DDI2;description=DNA damage inducible 1 homolog 2;gbkey=Gene;gene=DDI2;gene_biotype=protein_coding
chr1	BestRefSeq	gene	19920009	19923617	.	-	.	ID=gene-PLA2G2E;Dbxref=GeneID:30814,HGNC:HGNC:13414,MIM:618320;Name=PLA2G2E;description=phospholipase A2 group IIE;gbkey=Gene;gene=PLA2G2E;gene_biotype=protein_coding;gene_synonym=GIIE sPLA2,sPLA2-IIE

and a specific gene

gene=db['gene_1']
print(gene)

chr1	BestRefSeq%2CGnomon	gene	11686635	11725857	.	+	.	ID=gene-DRAXIN;Dbxref=GeneID:374946,HGNC:HGNC:25054,MIM:612682;Name=DRAXIN;description=dorsal inhibitory axon guidance protein;gbkey=Gene;gene=DRAXIN;gene_biotype=protein_coding;gene_synonym=AGPA3119,C1orf187,neucrin,UNQ3119

but no children:

for i in db.children(gene):
    print(i)

Is this just user error or something with the NCBI GFF?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
lldelislecommented, Jun 10, 2020

I never met a gtf where it did not work but I don’t know… I am using gtf files from ensembl.

1reaction
lldelislecommented, Jun 10, 2020

No all these features can be deduced from exons. In each exon feature you have the transcript_id … if you ask gffutils to build the db it will reconstruct a transcript where the start is the start of the first exon and the end the end of the last exon. You are not loosing any information… You can try and confirm.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Annotating Genomes with GFF3 or GTF files - NCBI - NIH
Annotating Genomes with GFF3 or GTF files. This page describes how to create an annoated genome submission from GFF3 or GTF files, ...
Read more >
Import - NCBI - NIH
Importing a GFF3 file​​ This box will compare the sequence IDS in the GFF table in column 1 to the sequence IDS in...
Read more >
Validation Error Explanations for Genomes - NCBI - NIH
Explanation : An author name has illegal characters. Suggestion : Check the first names (given names) in the sequence and publication references ...
Read more >
Gene Frequently Asked Questions - NCBI Bookshelf
How can I obtain the genomic sequence for a gene? ... Why can I sometimes display a record, but then cannot retrieve it...
Read more >
Frequently Asked Questions for Genomes - NCBI - NIH
Does NCBI have an annotation pipeline that can be used to annotate ... For example, we cannot accept a single sequence of all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found