Separating pooled primers
See original GitHub issueHi there
I am fairly new to cutadapt, running it on a hpc and the version that is loaded is cutadapt/1.16-gimkl-2017a-Python-3.6.3.
While I am waiting for this to be updated I am trying to still test the code, as I see it will not differ too much between versions.
I have demultiplexed fastq files, however each file still contains two primer sets that were pooled. I am trying to filter the primers - a pair at a time, each time from my demultiplexed files. Firstly, is it possible to retain ONLY the trimmed sequences?
So far I have two codes I am trying- the both are doing a loop treating the forward and reverse as single reads and the other is treating it as paired end reads.
Single reads:
#Directories
DATADIR='/Test/Extracted'
Output='/Test2'
PrimerF='AGGGCAAKYCTGGTGCCAGC'
PrimerR='GRCGGTATCTRATCGYCTT'
mkdir $Output/removed_primers
for i in $DATADIR/*_R1_001.fastq
do
R1=$i;
#replace "_R1_" with "_R2_" to get R2 file name
R2=${R1%_R1_001.fastq}_R2_001.fastq;
SAMPLE1=`basename ${R1%.fastq}`;
SAMPLE2=`basename ${R2%.fastq}`;
echo "processing $fqfileR1"
cutadapt \
-a $PrimerF \
--no-indels \
-e 0 \
--discard-untrimmed \
-o $Output/removed_primers/%$SAMPLE1.fastq \
$R1
echo "processing $fqfileR2"
cutadapt \
-a $PrimerR \
--no-indels \
-e 0 \
--discard-untrimmed \
-o $Output/removed_primers/%$SAMPLE2.fastq \
$R2
done
This seems to work fine, but when I try to import it into Qiime I get an error that the output cannot be read as a fastq file.
Qiime error: q2-SingleLanePerSamplePairedEndFastqDirFmt-2932vc8m/CR11_3_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file
HOWEVER, CR11_3_L001_R1_001.fastq.gz is not the output file that is in that directory… This is cutadapt 1.16 with Python 3.6.3 Command line parameters: -a AGGGCAAKYCTGGTGCCAGC -A GRCGGTATCTRATCGYCTT --no-indels -e 0 --discard-untrimmed -o Test2/removed_primers/%CR11_S83_L001_R1_001.fastq -p removed_primers/%CR11_S83_L001_R2_001.fastq Test/Extracted/CR11_S83_L001_R1_001.fastq Test/Extracted/CR11_S83_L001_R2_001.fastq Running on 1 core Trimming 2 adapters with at most 0.0% errors in paired-end mode … Finished in 7.35 s (22 us/read; 2.69 M reads/minute).
=== Summary ===
Total reads processed: 329,471 Reads with adapters: 171 (0.1%) Reads written (passing filters): 171 (0.1%)
Total basepairs processed: 96,760,386 bp Total written (filtered): 12,358 bp (0.0%)
=== Adapter 1 ===
Sequence: GRCGGTATCTRATCGYCTT; Type: regular 3’; Length: 19; Trimmed: 171 times.
No. of allowed errors: 0-19 bp: 0
Bases preceding removed adapters: A: 3.5% C: 7.6% G: 2.3% T: 11.1% none/other: 75.4%
=== Summary ===
Total reads processed: 329,471 Reads with adapters: 377 (0.1%) Reads written (passing filters): 377 (0.1%)
Total basepairs processed: 96,535,222 bp Total written (filtered): 52,385 bp (0.1%)
=== Adapter 1 ===
Sequence: AGGGCAAKYCTGGTGCCAGC; Type: regular 3’; Length: 20; Trimmed: 377 times.
No. of allowed errors: 0-20 bp: 0
Bases preceding removed adapters: A: 3.4% C: 17.5% G: 18.6% T: 7.2% none/other: 53.3%
Paired end:
#Directories
DATADIR='/Test/Extracted'
Output='/Test2'
PrimerF='AGGGCAAKYCTGGTGCCAGC'
PrimerR='GRCGGTATCTRATCGYCTT'
mkdir $Output/removed_primers
for i in $DATADIR/*_R1_001.fastq
do
R1=$i;
#replace "_R1_" with "_R2_" to get R2 file name
R2=${R1%_R1_001.fastq}_R2_001.fastq;
SAMPLE1=`basename ${R1%.fastq}`;
SAMPLE2=`basename ${R2%.fastq}`;
cutadapt -a $PrimerF -A $PrimerR --no-indels -e 0 --discard-untrimmed -o $Output/removed_primers/%$SAMPLE1.fastq -p $Output/removed_primers/%$SAMPLE2.fastq $R1 $R2
done
This doesn’t work, all sequences are removed.
` === Summary ===
Total read pairs processed: 329,471 Read 1 with adapter: 377 (0.1%) Read 2 with adapter: 171 (0.1%) Pairs written (passing filters): 120 (0.0%)
Total basepairs processed: 193,295,608 bp Read 1: 96,535,222 bp Read 2: 96,760,386 bp Total written (filtered): 0 bp (0.0%) Read 1: 0 bp Read 2: 0 bp
`
Is it possible to filter reads, remove the primers and thus keep these sequences? If so, can you advise me where I may be going wrong? As I would think treating them as single or paired should not necessarily make much of a difference in the output…
Any advice/suggestions would be appreciated.
Many thanks Aimee
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Awesome!
Thanks for this will give it a run 👍
Cross reference #546 on I versus N in primers