--error-correct-cell behaviour
See original GitHub issueHi,
Thanks for the tool. I have a problem with --error-correct-cell, I don’t know if I misunderstood something.
So first I do a whitelist:
umi_tools whitelist --extract-method=regex --bc-pattern='(?P<discard_1>TTCG){s<=1}(?P<cell_1>.{15,17})(?P<discard_2>TGCTTACGCTACGGAACGA){s<=3}(?P<umi_1>.{9})' --stdin=input.fastq -S BBC.txt
Which give me in particular this line in the whitelist output file:
CTGTTGATCACCCGTA CTGTTGATCACCCGTAT
And then I do an extract:
umi_tools extract --extract-method=regex --bc-pattern='(?P<discard_1>TTCG){s<=1}(?P<cell_1>.{15,17})(?P<discard_2>TGCTTACGCTACGGAACGA){s<=3}(?P<umi_1>.{9})' --whitelist BBC.txt --error-correct-cell --stdin=input.fastq -S input.BBC.fastq
And in the output FASTQ file, I have this read header:
@M05218:191:000000000-D7R5H:1:1102:14341:19429_CTGTTGATCACCCGTAT_CCTCAAACG 1:N:0:2
where I was expecting the BC to be corrected to “CTGTTGATCACCCGTA” but it actually has the uncorrect form (that was actually found in the read).
Is it the expected behaviour? Did I misunderstand something?
Cheers, Mathieu
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
There is indeed a redundancy between
--whitelist
and--filter-cell-barcode
, where the former is a path to a whitelist file and the later a switch to filter against this. We can leave this issue open as a reminder to remove the redundant option (suggest--whitelist
to switch on--filter-cell-barcode
and hide--filter-cell-barcode
option to not break any users current pipelines).error-correct-cell
is a separate option however since one might wish to only retain cells that perfectly match the whitelistActually it looks like there is redundancy in the 3 options “–whitelist”, “-error-correct-cell” and “–filter-cell-barcode” no?