question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Finding all adapter occurrences in each read

See original GitHub issue
  • cutadapt 1.13 and Python 3.5.2
  • Installed with pip Command line:

cutadapt -g file:5p_adapters.fasta --times 8 -O 15 --info-file test.tsv -o /dev/null test.fasta

I am trying to trim multiple adapters from a sequence and parse the info file to see what coordinates they were trimmed at. Unfortunately I can’t provide the actual sequences used to trim or being trimmed.

For this particular sequence I expect 3 adapters to be trimmed from the 5’ end.

I will call them 5p_a1, 5p_a2, and 5p_a3. They occur in this order in the target sequence and the adapters fasta I am using. When I run the above command however, 5p_a1 and 5p_a3 are removed but not 5p_a2.

I get the following output from cutadapt:

=== Adapter 5p_a1 ===

Sequence: AAAAAAAAAAAAAAAAAAAAAA; Type: regular 5'; Length: 22; Trimmed: 1 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-22 bp: 2

Overview of removed sequences
length  count   expect  max.err error counts
8540    1       0.0     2       1

=== Adapter 5p_a2 ===

Sequence: GGGGGGGGGGGGGGGGGGGGG; Type: regular 5'; Length: 21; Trimmed: 0 times.

=== Adapter 5p_a3 ===

Sequence: CCCCCCCCCCCCCCCCCCCCC; Type: regular 5'; Length: 21; Trimmed: 1 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences
length  count   expect  max.err error counts
113     1       0.0     2       1

So 5p_a2 not trimmed. However if I remove 5p_a3 from the adpaters file I get the following output:

=== Adapter 5p_a1 ===

Sequence: AAAAAAAAAAAAAAAAAAAAAA; Type: regular 5'; Length: 22; Trimmed: 1 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-22 bp: 2

Overview of removed sequences
length  count   expect  max.err error counts
8540    1       0.0     2       1

=== Adapter 5p_a2 ===

Sequence: GGGGGGGGGGGGGGGGGGGGG; Type: regular 5'; Length: 21; Trimmed: 0 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences
length  count   expect  max.err error counts
60      1       0.0     2       0 1

5p_a3 trimmed!

I can see that the error counts are slightly different for 5p_a2 - 0 1 compared to 1. But both have 1 total error so I am confused as to why this causes 5p_a2 to be skipped.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
awgymercommented, Dec 17, 2018

Ah ok, I think I understand what your query is now.

So the assumption is that I don’t actually know the order the adapters occur in, or at least I may not. Part of the reason for performing this step is that, assuming we expected to see A B C in the A C B C scenario we want to know that actually C could also trim there. So the intention would be that it would detect as many as possible by default I suppose , with --times in this case acting as a sort of limiter to enhance speed.

In reality, I know roughly how many times I expect the adapter sequences to occur (once per adapter) but not how many of the sites are actually present (i.e. 2 out of 4 adapters v all 4).

I will take some time over the holidays to play with some sort of Modifier class then, and see if I can come up with something which seems like it might be useful to more than just me.

0reactions
marcelmcommented, Jul 9, 2019

I’m closing this now as you suggested, but I’m adding it to my long-term to-do list (whatever that means).

Read more comments on GitHub >

github_iconTop Results From Across the Web

User guide — Cutadapt 4.2 documentation - Read the Docs
Cutadapt searches for the adapter in all reads and removes it when it finds it. Unless you use a filtering option, all reads...
Read more >
Trimming paired end sample did not remove all adapters #253
This happens when you search for 5' adapters (with -g / -G ) and reads have multiple occurrences of the adapter sequence. Currently,...
Read more >
Searching for adapter sequences in FASTQ files - metgenomics
Just run it through fastqc and you will know whether adapters are present or not. Exact grep makes little sense as sequencing errors...
Read more >
High-Throughput Identification of Adapters in Single-Read ...
A quality check is performed to check whether the input FASTQ files are adapter-trimmed by examining the read length distribution of raw reads...
Read more >
fastp: an ultra-fast all-in-one FASTQ preprocessor
Low-complexity sequences are removed because they are usually caused by sequencing artifacts. The adapter seeds are sorted by its occurrence ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found