question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

5' adapter not trimmed if read ends early

See original GitHub issue

Given a sequence with 5’ adapter, eg ALONGADAPTORsequence, if sequence is low quality in the end, or has polyG, cutadapt will trim this sequence into ALONGADAPTORseq (1st case) or ALONGADAP (2nd case). Then the -g argument and remove the adapter in the 1st case, but not in the 2nd case. And will cause adaptor contamination in the filtered reads.

_Originally posted by @yech1990 in https://github.com/marcelm/cutadapt/issues/550#issuecomment-921100958_

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
y9ccommented, Oct 1, 2021

thank you very much. I found this argument in the document now. 😂

On Fri, Oct 1, 2021, 04:14 Marcel Martin @.***> wrote:

Yes, you can use the anywhere adapter-trimming parameter. Jus twrite -a ‘ALONGADAPTOR;anywhere’. This works:

$ echo -e ‘>r\nADAPTORGGGGGGGGGGGGGGGGGG’ | cutadapt -N --quiet -a ‘ALONGADAPTOR;anywhere’ -

r

(The third example, as you gave it, will actually work because the error rate is by default 0.1, which will allow to delete the initial L.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcelm/cutadapt/issues/565#issuecomment-932061737, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKEVU3HLBS5EHXERAXBT3UEV3YXANCNFSM5EGRYBUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

1reaction
marcelmcommented, Sep 17, 2021

Thanks for reporting, this is indeed a problem that should be fixed.

Some notes to myself and/or those interested in the details.

  • This needs to be fixed for three different cases: For regular 5’ adapters, for anchored 5’ adapters and for anchored 5’ adapters in combination with --no-indels, which uses a different algorithm.
  • When aligning regular 5’ adapters, partial matches are only allowed at the 5’ end, not at the 3’ end. So an adapter ADAPTER will be found in PTERSEQUENCE, but not in SEQUENCEADAPT. This is necessary because otherwise short, random matches at the 3’ end would lead to the entire read being trimmed. It is also symmetrical to how 3’ adapters are aligned. (For those, partial occurrences at the 5’ end are not allowed.)
  • Not allowing partial matches at the 3’ end is why the adapters are not found in the examples above.
  • We therefore need to change the alignment algorithm such that it allows partial matches in those cases where the adapter is longer than the read.
  • Idea: For an adapter of length m and a read of length n, we look in the right-hand column of the alignment matrix not only at row index m, as currently, but also from n to m if n < m.

One test case that should work is to find a regular 5’ adapter ADAPTER, allowing 1 error (ignoring error rate), within TADAPT. The alignment could look like this:

 ADAPTER
TADAP-
Read more comments on GitHub >

github_iconTop Results From Across the Web

Why are adapter sequences trimmed from only the 3' ends of ...
Libraries prepared with Illumina library prep kits require adapter trimming only on the 3' ends of reads, because adapter sequences are not found...
Read more >
Trimming left end (5') of reads?? - SEQanswers
My understanding was that adapter contamination mainly arises when the read is too short so at the end of the read the sequencer...
Read more >
User guide — Cutadapt 4.2 documentation - Read the Docs
The 3' adapter in the last read is not trimmed because the anchored 5' adapter is required, but missing in the read. Linked...
Read more >
Trimming adapter sequences - is it necessary? - Biostars
Trimming adapter sequences - is it necessary? Removal of adapter sequences in a process called read trimming, or clipping, is one of the...
Read more >
Trimming adapter sequences - is it necessary? - ecSeq
Removal of adapter sequences in a process called read trimming, or clipping, ... that the 5' adapters will not appear in the sequenced...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found