Finding all adapter occurrences in each read
See original GitHub issuecutadapt 1.13
andPython 3.5.2
- Installed with
pip
Command line:
cutadapt -g file:5p_adapters.fasta --times 8 -O 15 --info-file test.tsv -o /dev/null test.fasta
I am trying to trim multiple adapters from a sequence and parse the info file to see what coordinates they were trimmed at. Unfortunately I can’t provide the actual sequences used to trim or being trimmed.
For this particular sequence I expect 3 adapters to be trimmed from the 5’ end.
I will call them 5p_a1
, 5p_a2
, and 5p_a3
.
They occur in this order in the target sequence and the adapters fasta I am using.
When I run the above command however, 5p_a1
and 5p_a3
are removed but not 5p_a2
.
I get the following output from cutadapt
:
=== Adapter 5p_a1 ===
Sequence: AAAAAAAAAAAAAAAAAAAAAA; Type: regular 5'; Length: 22; Trimmed: 1 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-22 bp: 2
Overview of removed sequences
length count expect max.err error counts
8540 1 0.0 2 1
=== Adapter 5p_a2 ===
Sequence: GGGGGGGGGGGGGGGGGGGGG; Type: regular 5'; Length: 21; Trimmed: 0 times.
=== Adapter 5p_a3 ===
Sequence: CCCCCCCCCCCCCCCCCCCCC; Type: regular 5'; Length: 21; Trimmed: 1 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
113 1 0.0 2 1
So 5p_a2
not trimmed.
However if I remove 5p_a3
from the adpaters file I get the following output:
=== Adapter 5p_a1 ===
Sequence: AAAAAAAAAAAAAAAAAAAAAA; Type: regular 5'; Length: 22; Trimmed: 1 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-22 bp: 2
Overview of removed sequences
length count expect max.err error counts
8540 1 0.0 2 1
=== Adapter 5p_a2 ===
Sequence: GGGGGGGGGGGGGGGGGGGGG; Type: regular 5'; Length: 21; Trimmed: 0 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
60 1 0.0 2 0 1
5p_a3
trimmed!
I can see that the error counts are slightly different for 5p_a2
- 0 1
compared to 1
.
But both have 1 total error so I am confused as to why this causes 5p_a2
to be skipped.
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (7 by maintainers)
Top GitHub Comments
Ah ok, I think I understand what your query is now.
So the assumption is that I don’t actually know the order the adapters occur in, or at least I may not. Part of the reason for performing this step is that, assuming we expected to see
A B C
in theA C B C
scenario we want to know that actuallyC
could also trim there. So the intention would be that it would detect as many as possible by default I suppose , with--times
in this case acting as a sort of limiter to enhance speed.In reality, I know roughly how many times I expect the adapter sequences to occur (once per adapter) but not how many of the sites are actually present (i.e. 2 out of 4 adapters v all 4).
I will take some time over the holidays to play with some sort of Modifier class then, and see if I can come up with something which seems like it might be useful to more than just me.
I’m closing this now as you suggested, but I’m adding it to my long-term to-do list (whatever that means).