Primers remaining in the output files after demultiplexing
See original GitHub issueHello!
I am using cutadapt to demultiplex my reads (v2.10 with Python 3.7.6).
Basically, I have a pair of fastq files (e.g. input_R1_001.fastq and input_R2_001.fastq) which contain sequences from three genes (18S, COI and ITS) and I want to create a pair of fastq files for each gene.
Here is an example of the command I am using: `cutadapt --pair-adapters -g TGGTGCATGGCCGTTCTTAGT -a GGTCTGTGATGCCCTTAGATG -G CATCTAAGGGCATCACAGACC -A ACTAAGAACGGCCATGCACCA -o 18S_R1_001.fastq -p 18S_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_18S.fastq --untrimmed-paired-output R2_untrimmed_18S.fastq
cutadapt --pair-adapters -g GGWACWGGWTGAACWGTWTAYCCYCC -a TGRTTYTTYGGNCAYCCNGARGTNTA -G TANACYTCNGGRTGNCCRAARAAYCA -A GGRGGRTAWACWGTTCAWCCWGTWCC -o COI_R1_001.fastq -p COI_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_COI.fastq --untrimmed-paired-output R2_untrimmed_COI.fastq
cutadapt --pair-adapters -g CTTGGTCATTTAGAGGAAGTAA -a GCATCGATGAAGAACGCAGC -G GCTGCGTTCTTCATCGATGC -A TTACTTCCTCTAAATGACCAAG -o ITS_R1_001.fastq -p ITS_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_ITS.fastq --untrimmed-paired-output R2_untrimmed_ITS.fastq `
The problem is that there are still primers found in the output files, so they are not completely trimmed/removed by cutadapt. Do you have any idea why this might be happening?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
For each of the commands that you listed, either the 5’ adapter or the 3’ adapter is removed from each read (depending on which one matches better) because Cutadapt by default removes only a single adapter from each read. (This holds even for paired-end reads: It will remove at most one from R1 and at most one from R2.)
You should probably use linked adapters if you want to remove a 5’ and a 3’ adapter at the same time.
So the first command would look something like this:
However, you can also use Cutadapt’s ability for demultiplexing to simplify the three commands into a single one (untested):
This will then create six output files
output_18S_R1.fastq
,output_COI_R1.fastq
and so on and evenoutput_unknown_R1.fastq
andoutput_unknown_R2.fastq
for the reads without a match.Just as a little hint when using the following demultiplexing command:
In our case, the demultiplexing resulted in quite a few reads containing sequences with a length of zero bp which created some problems for downstream analysis. Guess it would be a good idea to add
to the command (or whatever length specification toher than 1) to discard zero bp reads.
Cheers Nauras