Bug in --info-file with --revcomp
See original GitHub issue$ cutadapt 2>&1 | head -n1
This is cutadapt 3.3 with Python 3.8.5
It seems that there is a problem with the --info-file
when the adapter is found in revcomp
:
$ cat test.fna
>test
TATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCG
>test_rv
CGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATA
$
$ cutadapt --report=minimal --info-file test.info --revcomp -g TATCAGCTCACT test.fna -o /dev/null
[8<----------] 00:00:00 2 reads @ 237.0 µs/read; 0.25 M reads/minute
status in_reads in_bp too_short too_long too_many_n out_reads w/adapters qualtrim_bp out_bp
OK 2 200 0 0 0 2 2 0 176
$
$ cat test.info
test 0 0 12 TATCAGCTCACT CAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCG 1
test_rv rc 0 0 12 CGGTTCCTGGCC TTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATA 1
Above, there are two identical reads differing in orientation. cutadapt
correctly identifies the adapter in both, but in info-file
the sequences in columns 5-7 are incorrect for the read with a match in the reverse complement, as they contain substrings extracted from the wrong strand. The coordinates in columns 3-4 are in principle usable as they can be understood by a parser as referring to the opposite strand based on the flag " rc" (although false positives can emerge if input reads for some other reason already have this very string “rc” in the definition, e.g. >my-read read with rc
), but columns 5-7 are plainly unreliable.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Changes — Cutadapt 3.4 documentation
#438: The info file now contains the `` rc`` suffix that is added to the names of reverse-complemented reads (with --revcomp ).
Read more >cutadapt - remove adapter sequences from high-throughput ...
Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard characters are supported. The reverse complement is *not* automatically searched.
Read more >Reverse Complement - Bioinformatics.org
Reverse Complement converts a DNA sequence into its reverse, complement, or reverse-complement counterpart. You may want to work with the reverse-complement ...
Read more >Hapsembler version 2.1 ( + Encore & Scarpa) Manual - University of ...
create a library info file where each line obeys the following format: ... has an option to reverse complement either of the reads....
Read more >pyCRAC/Methods/__init__.py · master - GitLab
The chromosome info file should be a tab delimited file formatted as ... def reverse_complement(sequence): """ Returns the reverse complement of a DNA ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m personally totally happy. My idea is actually to locate spliced leader RNA which is in a way merely a (potentially truncated) adapter added at 5’ to transcripts e.g. in dinoflagellates.
@marcelm Thanks a lot!