question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Overzealous filtering of reads during CallDuplexConsensusReads

See original GitHub issue

Hi,

I’m using GroupReadsByUmi followed by CallDuplexConsensusReads on high read depth, high quality paired end reads. GroupReadsByUmi creates a 25GB bam file. When I then call CallDuplexConsensusReads, the ouput is only about 500MB. I was expecting a similarly sized ouput file.

Here’s the commands I run

java -Xmx64g -jar fgbio.jar GroupReadsByUmi --strategy=paired --input=my_sample_mapped.bam" --output=my_sample_groupedUMI.bam" --raw-tag=RX --assign-tag=MI --min-map-q=10 --edits=1

java -Xmx64g -jar fgbio.jar CallDuplexConsensusReads --input=my_sample_groupedUMI.bam" --output=my_sample_ds_consensus_unaligned.bam" --error-rate-pre-umi=45 --error-rate-post-umi=30 --min-input-base-quality=10 --threads=12

Am I misunderstanding the output of CallDuplexConsensusReads ? Is there a next step I need to do such as merging the CallDuplexConsensusReads’s output with some other bam file in order to get a bam with all duplex consensus reads?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tfennecommented, Sep 12, 2019

Yeah, that’s your problem. Setting min reads to 0 or perhaps --min-reads 1 0 0 would work. But beware then that the vast majority of your output will basically be consensus reads formed from single reads, which is largely the same as having the raw reads themselves.

1reaction
tfennecommented, Sep 11, 2019

The other possibility here is that the --min-reads defaults are causing a lot of reads to be discarded. By default it requires at least 1 read from each strand to form a duplex consensus. If you have a lot of molecules with only reads from one of the two original strands, then a lot of your data will be discarded/filtered and not make it into consensus.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CallDuplexConsensusReads emits no consensus. All raw ...
Hi, I need some help to understand why CallDuplexConsensusReads did not emit any consensus reads. fgbio CallDuplexConsensusReads ...
Read more >
FilterConsensusReads | fgbio - Fulcrum Genomics
When filtering reads, secondary alignments and supplementary records may be removed independently if they fail one or more filters; if either R1 or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found