Paired-end reads group with --read-length do not take R2 length into account
See original GitHub issueHi,
I want to group reads based on UMI, alignment position and read length. I’ve paired-end data. So I use this command to group my reads :
python $umitools/group.py -I sorted.bam --paired --read-length --edit-distance-threshold=1 --group-out=groups.tsv -L stats.txt
When I check the groups.tsv file and extract reads from the same group I found that R1 reads have the same UMI, the same alignment position and the same length. But when I check the associated R2 reads they have same UMI and alignment positions but do not have the same length.
Example for a group with 7 reads:
R1 reads
[1] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[2] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[3] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[4] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[5] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[6] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGTAATGTGCTG
[7] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGTAATGTGCTG
R2 reads
aln
[1] GGTCTCTCCTGGCCGCTAG-------------------------------
[2] GGTCTCTCCTGGCCGCTAGCTGTCTCTTATACACTCTGACGCTGCCGACG
[3] GGTCTCTCCTGGCCGCTAG-------------------------------
[4] GGTCTCTCCTGGCCGCTAG-------------------------------
[5] GGTCTCTCCTGGCCGCTAG-------------------------------
[6] GGTCTCTCCTGGCCGCTAG-------------------------------
[7] GGTCTCTCCTGGCCGCTAG-------------------------------
the second R2 read has a different length. How can I tell UMI-tools to take R2 length also into account.
Thanks
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Paired-End vs. Single-Read Sequencing Technology - Illumina
Paired -end runs sequence both DNA ends, for easier analysis of rearrangements, novel transcripts, and more. Single-end runs offer an economical alternative.
Read more >Stacks v2.0 process_radtags trims R1 but not R2 reads
I first looked at the Fast QC files for the raw dataset and it showed that both the R1 & R2 read lengths...
Read more >NGmerge: merging paired-end reads via novel empirically ...
These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction...
Read more >Joining Illumina paired-end reads for classifying phylogenetic ...
When a DNA fragment is shorter than two times the read length, the paired reads overlap and can be merged into a longer...
Read more >Long fragments achieve lower base quality in Illumina paired ...
We show that the fragment length is a major driver of increased error rates in the R2 reads. Fragments above 500 nt tend...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just to be clear - if you run in
--paired
we will look for reads that have the same R1 start and the same R2 start. The lengths of R1 and R2 can be different (without--read-length
) BUT the length from the start of R1 to the start of R2 will be the same.I’m closing this issue due to low activity