Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Separating pooled primers

See original GitHub issue

Hi there

I am fairly new to cutadapt, running it on a hpc and the version that is loaded is cutadapt/1.16-gimkl-2017a-Python-3.6.3.

While I am waiting for this to be updated I am trying to still test the code, as I see it will not differ too much between versions.

I have demultiplexed fastq files, however each file still contains two primer sets that were pooled. I am trying to filter the primers - a pair at a time, each time from my demultiplexed files. Firstly, is it possible to retain ONLY the trimmed sequences?

So far I have two codes I am trying- the both are doing a loop treating the forward and reverse as single reads and the other is treating it as paired end reads.

Single reads:

#Directories
DATADIR='/Test/Extracted'
Output='/Test2'
PrimerF='AGGGCAAKYCTGGTGCCAGC'
PrimerR='GRCGGTATCTRATCGYCTT' 
mkdir $Output/removed_primers

for i in $DATADIR/*_R1_001.fastq
do
R1=$i;

#replace "_R1_" with "_R2_" to get R2 file name
R2=${R1%_R1_001.fastq}_R2_001.fastq;

SAMPLE1=`basename ${R1%.fastq}`;
SAMPLE2=`basename ${R2%.fastq}`;


echo "processing $fqfileR1"
cutadapt \
-a $PrimerF \
--no-indels \
-e 0 \
--discard-untrimmed \
-o $Output/removed_primers/%$SAMPLE1.fastq \
$R1

echo "processing $fqfileR2"
cutadapt \
-a $PrimerR \
--no-indels \
-e 0 \
--discard-untrimmed \
-o $Output/removed_primers/%$SAMPLE2.fastq \
$R2

done

This seems to work fine, but when I try to import it into Qiime I get an error that the output cannot be read as a fastq file.

Qiime error: q2-SingleLanePerSamplePairedEndFastqDirFmt-2932vc8m/CR11_3_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file

HOWEVER, CR11_3_L001_R1_001.fastq.gz is not the output file that is in that directory… This is cutadapt 1.16 with Python 3.6.3 Command line parameters: -a AGGGCAAKYCTGGTGCCAGC -A GRCGGTATCTRATCGYCTT --no-indels -e 0 --discard-untrimmed -o Test2/removed_primers/%CR11_S83_L001_R1_001.fastq -p removed_primers/%CR11_S83_L001_R2_001.fastq Test/Extracted/CR11_S83_L001_R1_001.fastq Test/Extracted/CR11_S83_L001_R2_001.fastq Running on 1 core Trimming 2 adapters with at most 0.0% errors in paired-end mode … Finished in 7.35 s (22 us/read; 2.69 M reads/minute).

=== Summary ===

Total reads processed: 329,471 Reads with adapters: 171 (0.1%) Reads written (passing filters): 171 (0.1%)

Total basepairs processed: 96,760,386 bp Total written (filtered): 12,358 bp (0.0%)

=== Adapter 1 ===

Sequence: GRCGGTATCTRATCGYCTT; Type: regular 3’; Length: 19; Trimmed: 171 times.

No. of allowed errors: 0-19 bp: 0

Bases preceding removed adapters: A: 3.5% C: 7.6% G: 2.3% T: 11.1% none/other: 75.4%

=== Summary ===

Total reads processed: 329,471 Reads with adapters: 377 (0.1%) Reads written (passing filters): 377 (0.1%)

Total basepairs processed: 96,535,222 bp Total written (filtered): 52,385 bp (0.1%)

=== Adapter 1 ===

Sequence: AGGGCAAKYCTGGTGCCAGC; Type: regular 3’; Length: 20; Trimmed: 377 times.

No. of allowed errors: 0-20 bp: 0

Bases preceding removed adapters: A: 3.4% C: 17.5% G: 18.6% T: 7.2% none/other: 53.3%

Paired end:

#Directories
DATADIR='/Test/Extracted'
Output='/Test2'
PrimerF='AGGGCAAKYCTGGTGCCAGC'
PrimerR='GRCGGTATCTRATCGYCTT' 
mkdir $Output/removed_primers

for i in $DATADIR/*_R1_001.fastq
do
R1=$i;

#replace "_R1_" with "_R2_" to get R2 file name
R2=${R1%_R1_001.fastq}_R2_001.fastq;

SAMPLE1=`basename ${R1%.fastq}`;
SAMPLE2=`basename ${R2%.fastq}`;

cutadapt -a $PrimerF -A $PrimerR --no-indels -e 0 --discard-untrimmed -o $Output/removed_primers/%$SAMPLE1.fastq -p $Output/removed_primers/%$SAMPLE2.fastq $R1 $R2


done

This doesn’t work, all sequences are removed.

` === Summary ===

Total read pairs processed: 329,471 Read 1 with adapter: 377 (0.1%) Read 2 with adapter: 171 (0.1%) Pairs written (passing filters): 120 (0.0%)

Total basepairs processed: 193,295,608 bp Read 1: 96,535,222 bp Read 2: 96,760,386 bp Total written (filtered): 0 bp (0.0%) Read 1: 0 bp Read 2: 0 bp

Is it possible to filter reads, remove the primers and thus keep these sequences? If so, can you advise me where I may be going wrong? As I would think treating them as single or paired should not necessarily make much of a difference in the output…

Any advice/suggestions would be appreciated.

Many thanks Aimee

Issue Analytics

State:
Created 5 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

AvdReiscommented, Mar 11, 2019

Awesome!

Thanks for this will give it a run 👍

0reactions

peterjccommented, Jun 28, 2022

Cross reference #546 on I versus N in primers

Top Results From Across the Web

PrimerPooler: automated primer pooling to prepare library for ...

Targeted next-generation sequencing based on PCR amplification involves pooling of hundreds to thousands of primers, for preamplification and ...

What are primer pools? How many pools do I need for custom ...

Each separate tube of mixture of primers is called a primer pool. In common cases, hotspot targets that are far apart from one...

Fragmentation of Pooled PCR Products for Highly Multiplexed ...

PCR was performed on pooled genomic DNA samples using 32 unique primer pairs. Individual PCR reactions were performed for each primer pair.

Primer Pooler (multiplex PCR) - Silas S. Brown's home page

Automatically search the genome sequence to find which amplicons overlap, and place their corresponding primers in separate pools,; Optionally keep pool sizes ...

Early sample tagging and pooling enables simultaneous ...

We show that we can introduce barcoded and target-specific RT primers to the samples, allowing them to hybridize to target RNA molecules already ......