Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Primers remaining in the output files after demultiplexing

See original GitHub issue

Hello!

I am using cutadapt to demultiplex my reads (v2.10 with Python 3.7.6).

Basically, I have a pair of fastq files (e.g. input_R1_001.fastq and input_R2_001.fastq) which contain sequences from three genes (18S, COI and ITS) and I want to create a pair of fastq files for each gene.

Here is an example of the command I am using: `cutadapt --pair-adapters -g TGGTGCATGGCCGTTCTTAGT -a GGTCTGTGATGCCCTTAGATG -G CATCTAAGGGCATCACAGACC -A ACTAAGAACGGCCATGCACCA -o 18S_R1_001.fastq -p 18S_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_18S.fastq --untrimmed-paired-output R2_untrimmed_18S.fastq

cutadapt --pair-adapters -g GGWACWGGWTGAACWGTWTAYCCYCC -a TGRTTYTTYGGNCAYCCNGARGTNTA -G TANACYTCNGGRTGNCCRAARAAYCA -A GGRGGRTAWACWGTTCAWCCWGTWCC -o COI_R1_001.fastq -p COI_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_COI.fastq --untrimmed-paired-output R2_untrimmed_COI.fastq

cutadapt --pair-adapters -g CTTGGTCATTTAGAGGAAGTAA -a GCATCGATGAAGAACGCAGC -G GCTGCGTTCTTCATCGATGC -A TTACTTCCTCTAAATGACCAAG -o ITS_R1_001.fastq -p ITS_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_ITS.fastq --untrimmed-paired-output R2_untrimmed_ITS.fastq `

The problem is that there are still primers found in the output files, so they are not completely trimmed/removed by cutadapt. Do you have any idea why this might be happening?

Issue Analytics

State:
Created 2 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

marcelmcommented, Jun 6, 2021

For each of the commands that you listed, either the 5’ adapter or the 3’ adapter is removed from each read (depending on which one matches better) because Cutadapt by default removes only a single adapter from each read. (This holds even for paired-end reads: It will remove at most one from R1 and at most one from R2.)

You should probably use linked adapters if you want to remove a 5’ and a 3’ adapter at the same time.

So the first command would look something like this:

cutadapt \
  --pair-adapters \
  -a ^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -o 18S_R1_001.fastq \
  -p 18S_R2_001.fastq \
  --untrimmed-output R1_untrimmed_18S.fastq \
  --untrimmed-paired-output R2_untrimmed_18S.fastq \
  input_R1_001.fastq \
  input_R2_001.fastq

However, you can also use Cutadapt’s ability for demultiplexing to simplify the three commands into a single one (untested):

cutadapt \
  --pair-adapters \
  -a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
  -A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
  -a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
  -A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
  -o "output_{name}_R1.fastq" \
  -p "output_{name}_R2.fastq" \
  input_R1_001.fastq \
  input_R2_001.fastq

This will then create six output files output_18S_R1.fastq, output_COI_R1.fastq and so on and even output_unknown_R1.fastq and output_unknown_R2.fastq for the reads without a match.

0reactions

naurasdcommented, Aug 25, 2021

Just as a little hint when using the following demultiplexing command:

cutadapt \
  --pair-adapters \
  -a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
  -A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
  -a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
  -A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
  -a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
  -A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
  -o "output_{name}_R1.fastq" \
  -p "output_{name}_R2.fastq" \
  input_R1_001.fastq \
  input_R2_001.fastq

In our case, the demultiplexing resulted in quite a few reads containing sequences with a length of zero bp which created some problems for downstream analysis. Guess it would be a good idea to add