Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Paired-end reads group with --read-length do not take R2 length into account

See original GitHub issue

Hi,

I want to group reads based on UMI, alignment position and read length. I’ve paired-end data. So I use this command to group my reads :

python $umitools/group.py -I sorted.bam --paired --read-length --edit-distance-threshold=1 --group-out=groups.tsv -L stats.txt

When I check the groups.tsv file and extract reads from the same group I found that R1 reads have the same UMI, the same alignment position and the same length. But when I check the associated R2 reads they have same UMI and alignment positions but do not have the same length.

Example for a group with 7 reads:

R1 reads

[1] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[2] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[3] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[4] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[5] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGAACTCCCGTC
[6] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGTAATGTGCTG
[7] CTAGCGGCCAGGAGAGACCAGATCGGAAGAGCGCGTAATGTGCTG

R2 reads

    aln 
[1] GGTCTCTCCTGGCCGCTAG-------------------------------
[2] GGTCTCTCCTGGCCGCTAGCTGTCTCTTATACACTCTGACGCTGCCGACG
[3] GGTCTCTCCTGGCCGCTAG-------------------------------
[4] GGTCTCTCCTGGCCGCTAG-------------------------------
[5] GGTCTCTCCTGGCCGCTAG-------------------------------
[6] GGTCTCTCCTGGCCGCTAG-------------------------------
[7] GGTCTCTCCTGGCCGCTAG-------------------------------

the second R2 read has a different length. How can I tell UMI-tools to take R2 length also into account.

Thanks

Issue Analytics

State:
Created 6 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

IanSudberycommented, Nov 16, 2017

Just to be clear - if you run in --paired we will look for reads that have the same R1 start and the same R2 start. The lengths of R1 and R2 can be different (without --read-length) BUT the length from the start of R1 to the start of R2 will be the same.

0reactions

TomSmithCGATcommented, Feb 11, 2019

I’m closing this issue due to low activity

Top Results From Across the Web

Paired-End vs. Single-Read Sequencing Technology - Illumina

Paired -end runs sequence both DNA ends, for easier analysis of rearrangements, novel transcripts, and more. Single-end runs offer an economical alternative.

Stacks v2.0 process_radtags trims R1 but not R2 reads

I first looked at the Fast QC files for the raw dataset and it showed that both the R1 & R2 read lengths...

NGmerge: merging paired-end reads via novel empirically ...

These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction...

Joining Illumina paired-end reads for classifying phylogenetic ...

When a DNA fragment is shorter than two times the read length, the paired reads overlap and can be merged into a longer...

Long fragments achieve lower base quality in Illumina paired ...

We show that the fragment length is a major driver of increased error rates in the R2 reads. Fragments above 500 nt tend...