Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Paired-end reads group with --read-length do not take R2 length into account

See original GitHub issue


I want to group reads based on UMI, alignment position and read length. I’ve paired-end data. So I use this command to group my reads :

python $umitools/ -I sorted.bam --paired --read-length --edit-distance-threshold=1 --group-out=groups.tsv -L stats.txt

When I check the groups.tsv file and extract reads from the same group I found that R1 reads have the same UMI, the same alignment position and the same length. But when I check the associated R2 reads they have same UMI and alignment positions but do not have the same length.

Example for a group with 7 reads:

R1 reads


R2 reads

[1] GGTCTCTCCTGGCCGCTAG-------------------------------
[3] GGTCTCTCCTGGCCGCTAG-------------------------------
[4] GGTCTCTCCTGGCCGCTAG-------------------------------
[5] GGTCTCTCCTGGCCGCTAG-------------------------------
[6] GGTCTCTCCTGGCCGCTAG-------------------------------
[7] GGTCTCTCCTGGCCGCTAG-------------------------------

the second R2 read has a different length. How can I tell UMI-tools to take R2 length also into account.


Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

IanSudberycommented, Nov 16, 2017

Just to be clear - if you run in --paired we will look for reads that have the same R1 start and the same R2 start. The lengths of R1 and R2 can be different (without --read-length) BUT the length from the start of R1 to the start of R2 will be the same.

TomSmithCGATcommented, Feb 11, 2019

I’m closing this issue due to low activity

Read more comments on GitHub >

github_iconTop Results From Across the Web

Paired-End vs. Single-Read Sequencing Technology - Illumina
Paired -end runs sequence both DNA ends, for easier analysis of rearrangements, novel transcripts, and more. Single-end runs offer an economical alternative.
Read more >
Stacks v2.0 process_radtags trims R1 but not R2 reads
I first looked at the Fast QC files for the raw dataset and it showed that both the R1 & R2 read lengths...
Read more >
NGmerge: merging paired-end reads via novel empirically ...
These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction...
Read more >
Joining Illumina paired-end reads for classifying phylogenetic ...
When a DNA fragment is shorter than two times the read length, the paired reads overlap and can be merged into a longer...
Read more >
Long fragments achieve lower base quality in Illumina paired ...
We show that the fragment length is a major driver of increased error rates in the R2 reads. Fragments above 500 nt tend...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found