issue with cluster_report
See original GitHub issueHi,
I am trying to detect the isoforms as well as the tissue-specific isoforms . We have 4 tissues, each 2 smrt cell sequenced. So 8 files. I have the hq isoseq sequences and the corresponding cluster_report.csv file for each of them (unfortunately at the beginning, we have not run the processing after merging the same tissues, but before). Now when I wanted to run collapse_isoforms_by_sam.py, I merged the 2 samples from the same tissue. 1st, I got the error for duplicate ID since some of the ids were common between the tissue1-file1.fq and tissue1-file2.fq.
Question1: any suggestion for overcoming this issue in a more appropriate way then what I did bellow?
I merged the file1 and file2 of the same tissue, and renamed the fastq headers using
>transcript/AutoIncrementID <rest of the header>
everything went well and I got the collapsed isoforms etc…
Then I wanted to take the next steps for counting and filtering for the degraded 5’ etc., but it asked for the cluster_report.csv which I have for each sample tissue. Now that I have changed the ids, they do not match. So what is the best way to overcome this issue before I go ahead and run the whole processing from the beginning for FLNC and nfl generation etc.
Question2: What is the best way to detect tissue-specific isoforms with such data? Any suggestion?
Thanks for any suggestion,
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (6 by maintainers)
Hi Hamed,
Sorry I did not receive it. Please send it to etseng@pacb.com or give me an email so I can request file upload.
–Liz
Done, hope this time it went ok and delivered. Please let me know
On Thu, Aug 8, 2019 at 12:21 PM Hamed Bostan bostanict.net@gmail.com wrote:
– Hamed Bostan, PhD Computational Biology and Bioinformatics