Can I specify a required anchored R1 5' adapter with optional R1+R2 3' adapters without changing the pair filter setting?
See original GitHub issueI’m using cutadapt 3.5 with Python 3.9.7, installed via conda.
I have paired-end data where there are 3’ adapters on both R1 and R2 that may or may not be found, and an anchored 5’ adapter on R1 that should always be there. I’d like to trim all of these and filter read pairs that don’t have that anchored adapter, but otherwise keep them (regardless of the trimming status of either of the other adapters).
Is there a way to get this behavior without using --pair-filter first
? I ask because I apply other filters (quality cutoff and minimum length) that should apply with the default (--pair-filter any
) behavior.
I also asked this in full pedantic detail here:
https://bioinformatics.stackexchange.com/questions/17893
example I used there:
- required R1 5’ adapter: AAAAAAAA
- R1 3’ adapter: TTTTTTTT
- R2 3’ adapter: GGGGGGGG
input R1:
@read1 should be kept
AAAAAAAACGTCCTGGGATTTGTAATAATATTTTAGTTCTGAGCGACAAGTAAGGGATAATTTTTTTT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read2 should be filtered
TACCAACTACATTTAGCTTCAGGCTAGTGATGCCCGCCGTCGGCACACTGGACACATGGCTTTTTTTT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read3 should be kept
AAAAAAAACAATTTTTACTCTAGAAATGTCTGTGCTCATCACCCCGACACCGAATAGCTATGACTCAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
input R2:
@read1
AGGCTCACCGCCACTGTTGTACCTTCCTATCGCCACTCTAAGATCTATGTAACCTCTCCCGGGGGGGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read2
TGAATAATACTGCGGTTTAATCGATATATTCGGGATTATGCAAGACACCCTACGTACTTAGGGGGGGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read3
TACTGGAGGGTACAAGCCCGGACTATCCACGGCGTCAGCTGGCTTAGCATTGAAGGTCGGCGGTGTGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
intended output R1:
@read1 should be kept
CGTCCTGGGATTTGTAATAATATTTTAGTTCTGAGCGACAAGTAAGGGATAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read3 should be kept
CAATTTTTACTCTAGAAATGTCTGTGCTCATCACCCCGACACCGAATAGCTATGACTCAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
intended output R2:
@read1
AGGCTCACCGCCACTGTTGTACCTTCCTATCGCCACTCTAAGATCTATGTAACCTCTCCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@read3
TACTGGAGGGTACAAGCCCGGACTATCCACGGCGTCAGCTGGCTTAGCATTGAAGGTCGGCGGTGTGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
I have thought about this now for a bit and cannot come up with a good command-line syntax, so probably my recommendation will remain to run multiple Cutadapt processes in a pipe. I’ll close it now and wait for at least one other person to request that feature, and then I’ll think about this again.
That’s totally clear now, thanks! Feel free to close this issue unless you want a reminder on your end.