question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

readGTF split issue

See original GitHub issue

Hi,

Thanks a lot for developing xpore!

I’m working on some non-model organisms, and testing xpore with our own pipeline. The transcripts were assembled with stringtie, and xpore was installed by conda install, the package info is:

# packages in environment at /home/ubuntu/Tools/miniconda3/envs/xpore:
#
# Name                    Version                   Build  Channel
xpore                     2.0                pyh5e36f6f_0    bioconda

and we found stringtie output gtf file was not compatible with xpore. The trackback was like below:

Traceback (most recent call last):
  File "/home/ubuntu/Tools/miniconda3/envs/xpore/bin/xpore", line 10, in <module>
    sys.exit(main())
  File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/xpore.py", line 67, in main
    options.func(options)
  File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 692, in dataprep
    gtf_dict = readGTF(gtf_path_or_url)
  File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 184, in readGTF
    tx_id=ln[-1].split('; transcript_id "')[1].split('";')[0]

By checking the readGTF function, I found this issue may due to a “non-standard” split of gtf attributes (column 9) in readGTF function: https://github.com/GoekeLab/xpore/blob/8722c06314d1fec90dde85347186612144fab6e6/xpore/scripts/dataprep.py#L184-L185

The stringtie gtf has transcript_id as the first attribute, but gene_id as the second, which is not the same as human hg38 gtf file.

example stringtie.gtf:

Gm01	StringTie	transcript	120160	131940	.	+	.	transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; gene_name "MSTRG.3"; xloc "XLOC_000001"; cmp_ref "Glyma.01G000600.2.Wm82.a4.v1"; class_code "="; cmp_ref_gene "Glyma.01G000600.Wm82.a4.v1"; tss_id "TSS1";
Gm01	StringTie	exon	120160	122559	.	+	.	transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "1";
Gm01	StringTie	exon	131469	131940	.	+	.	transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "2";

Hope this helps, thanks! obenno

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
obennocommented, Aug 21, 2021

Hi @yuukiiwa,

Thanks a lot for your prompt response.

If you don’t mind, I could do a PR ^^

Best regards, obenno

0reactions
obennocommented, Aug 27, 2021

Hi @yuukiiwa,

Thank you and please feel free to close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix my broken GTF file? - Biostars
I think you can get gene region on genome by your gtf file. Ignore UTR region. You can try to get the start...
Read more >
Read GTF file into R - Dave Tang's blog
Read GTF file into R ... The first eight fields in a GTF file are the same as GFF but the group field...
Read more >
read.gtf: Parses a GTF2 file in Rgb: The R Genome Browser
Single character value, defining how to deal with attributes. "skip" discards the attributes data, "intact" does not process it and "split" adds a...
Read more >
readTranscriptFeatures with GTF · Issue #192 - GitHub
Gene features are commonly stored as a GTF file. Is there a way to import a GTF file in the proper format for...
Read more >
The GTF/GFF formats — AGAT documentation - Read the Docs
Problem encountered due to lack of standardization. Ensembl GTF formats. Evolution of the 3rd and 9th column. Difference between GENCODE and Ensembl GTF ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found