question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Primers remaining in the output files after demultiplexing

See original GitHub issue

Hello!

I am using cutadapt to demultiplex my reads (v2.10 with Python 3.7.6).

Basically, I have a pair of fastq files (e.g. input_R1_001.fastq and input_R2_001.fastq) which contain sequences from three genes (18S, COI and ITS) and I want to create a pair of fastq files for each gene.

Here is an example of the command I am using: `cutadapt --pair-adapters -g TGGTGCATGGCCGTTCTTAGT -a GGTCTGTGATGCCCTTAGATG -G CATCTAAGGGCATCACAGACC -A ACTAAGAACGGCCATGCACCA -o 18S_R1_001.fastq -p 18S_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_18S.fastq --untrimmed-paired-output R2_untrimmed_18S.fastq

cutadapt --pair-adapters -g GGWACWGGWTGAACWGTWTAYCCYCC -a TGRTTYTTYGGNCAYCCNGARGTNTA -G TANACYTCNGGRTGNCCRAARAAYCA -A GGRGGRTAWACWGTTCAWCCWGTWCC -o COI_R1_001.fastq -p COI_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_COI.fastq --untrimmed-paired-output R2_untrimmed_COI.fastq

cutadapt --pair-adapters -g CTTGGTCATTTAGAGGAAGTAA -a GCATCGATGAAGAACGCAGC -G GCTGCGTTCTTCATCGATGC -A TTACTTCCTCTAAATGACCAAG -o ITS_R1_001.fastq -p ITS_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_ITS.fastq --untrimmed-paired-output R2_untrimmed_ITS.fastq `

The problem is that there are still primers found in the output files, so they are not completely trimmed/removed by cutadapt. Do you have any idea why this might be happening?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
marcelmcommented, Jun 6, 2021

For each of the commands that you listed, either the 5’ adapter or the 3’ adapter is removed from each read (depending on which one matches better) because Cutadapt by default removes only a single adapter from each read. (This holds even for paired-end reads: It will remove at most one from R1 and at most one from R2.)

You should probably use linked adapters if you want to remove a 5’ and a 3’ adapter at the same time.

So the first command would look something like this:

cutadapt \
  --pair-adapters \
  -a ^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -o 18S_R1_001.fastq \
  -p 18S_R2_001.fastq \
  --untrimmed-output R1_untrimmed_18S.fastq \
  --untrimmed-paired-output R2_untrimmed_18S.fastq \
  input_R1_001.fastq \
  input_R2_001.fastq 

However, you can also use Cutadapt’s ability for demultiplexing to simplify the three commands into a single one (untested):

cutadapt \
  --pair-adapters \
  -a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
  -A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
  -a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
  -A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
  -o "output_{name}_R1.fastq" \
  -p "output_{name}_R2.fastq" \
  input_R1_001.fastq \
  input_R2_001.fastq

This will then create six output files output_18S_R1.fastq, output_COI_R1.fastq and so on and even output_unknown_R1.fastq and output_unknown_R2.fastq for the reads without a match.

0reactions
naurasdcommented, Aug 25, 2021

Just as a little hint when using the following demultiplexing command:

cutadapt \
  --pair-adapters \
  -a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
  -A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
  -a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
  -A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
  -o "output_{name}_R1.fastq" \
  -p "output_{name}_R2.fastq" \
  input_R1_001.fastq \
  input_R2_001.fastq

In our case, the demultiplexing resulted in quite a few reads containing sequences with a length of zero bp which created some problems for downstream analysis. Guess it would be a good idea to add

- m 1

to the command (or whatever length specification toher than 1) to discard zero bp reads.

Cheers Nauras

Read more comments on GitHub >

github_iconTop Results From Across the Web

Remove Primer in paired-end demultiplexed file
I imported my paired-end demultiplexed data. now I want to denoise it but I don't know how to remove the primer.
Read more >
Initial processing of raw sequence data
In this method, the sequencing center will usually do the demultiplexing for you, because the barcode is in a region recognised by Illumina...
Read more >
User guide — Cutadapt 4.2 documentation - Read the Docs
Cutadapt searches for the adapter in all reads and removes it when it finds it. Unless you use a filtering option, all reads...
Read more >
DEMULTIPLEXING MISEQ PAIRED READS - mothur forum
First, you don’t need the primer line in the oligos file since the sequencing primers aren’t actually in the sequence data. Second, you...
Read more >
Pre-processing of 10X Single-Cell RNA Datasets
What are BCL and MTX files? What is an HDF5 file, and why is it important? Objectives: Demultiplex single-cell FASTQ data from 10X...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found