readGTF split issue
See original GitHub issueHi,
Thanks a lot for developing xpore!
I’m working on some non-model organisms, and testing xpore with our own pipeline. The transcripts were assembled with stringtie, and xpore was installed by conda install
, the package info is:
# packages in environment at /home/ubuntu/Tools/miniconda3/envs/xpore:
#
# Name Version Build Channel
xpore 2.0 pyh5e36f6f_0 bioconda
and we found stringtie output gtf file was not compatible with xpore. The trackback was like below:
Traceback (most recent call last):
File "/home/ubuntu/Tools/miniconda3/envs/xpore/bin/xpore", line 10, in <module>
sys.exit(main())
File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/xpore.py", line 67, in main
options.func(options)
File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 692, in dataprep
gtf_dict = readGTF(gtf_path_or_url)
File "/home/ubuntu/Tools/miniconda3/envs/xpore/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 184, in readGTF
tx_id=ln[-1].split('; transcript_id "')[1].split('";')[0]
By checking the readGTF
function, I found this issue may due to a “non-standard” split of gtf attributes (column 9) in readGTF function:
https://github.com/GoekeLab/xpore/blob/8722c06314d1fec90dde85347186612144fab6e6/xpore/scripts/dataprep.py#L184-L185
The stringtie gtf has transcript_id
as the first attribute, but gene_id
as the second, which is not the same as human hg38 gtf file.
example stringtie.gtf:
Gm01 StringTie transcript 120160 131940 . + . transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; gene_name "MSTRG.3"; xloc "XLOC_000001"; cmp_ref "Glyma.01G000600.2.Wm82.a4.v1"; class_code "="; cmp_ref_gene "Glyma.01G000600.Wm82.a4.v1"; tss_id "TSS1";
Gm01 StringTie exon 120160 122559 . + . transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "1";
Gm01 StringTie exon 131469 131940 . + . transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "2";
Hope this helps, thanks! obenno
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Hi @yuukiiwa,
Thanks a lot for your prompt response.
If you don’t mind, I could do a PR ^^
Best regards, obenno
Hi @yuukiiwa,
Thank you and please feel free to close this issue.