question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

StackOverflowError in GroupReadsByUmi in deep regions

See original GitHub issue

Nils and Tim; We’re running into a StackOverflowError when dealing with deep regions with a large number of reads per UMI. I’ve put together a reproducible example here with an input BAM and run script:

wget https://s3.amazonaws.com/chapmanb/testcases/fgbio_umi_overflow.tar.gz

The command line we’re using is:

fgbio -Xms500m -Xmx4g -XX:+UseSerialGC --async-io=true --compression=0 GroupReadsByUmi --edits=1 --min-map-q=1 -t RX -s adjacency -i in.bam | fgbio -Xms500m -Xmx4g -XX:+UseSerialGC  --async-io=true --compression=0 CallMolecularConsensusReads --min-input-base-quality=10 --min-reads=1 --max-reads=10 --output-per-base-tags=false --sort-order=:none: -i /dev/stdin -o out.bam

I’d been hoping --max-reads would help downsample and avoid the issue but it doesn’t seem to impact runtime or help resolve the underlying issue. I’ve also tried on a different non-failing (but slow sample) and it didn’t seem to help runtimes there so I might be misusing it for trying to power through diverse deep regions.

Any suggestions or tips would be much appreciated.

[2017/10/13 13:49:37 | FgBioMain | Info] Executing GroupReadsByUmi from fgbio version 0.2.1-SNAPSHOT as kkrq359@gpu2 on JRE 1.8.0_102-b14 with snappy
[2017/10/13 13:49:37 | FgBioMain | Info] Executing CallMolecularConsensusReads from fgbio version 0.2.1-SNAPSHOT as kkrq359@gpu2 on JRE 1.8.0_102-b14 with snappy
[2017/10/13 13:49:37 | GroupReadsByUmi | Info] Filtering and sorting input.
[2017/10/13 13:50:03 | GroupReadsByUmi | Info] Sorted     1,000,000 record.  Elapsed time: 00:00:25s.  Time for last 1,000,000:   25s.  Last read position: chr6:157,416,972
[2017/10/13 13:50:07 | GroupReadsByUmi | Info] Assigning reads to UMIs and outputting.
[2017/10/13 14:01:41 | CallMolecularConsensusReads | Info] processed       500,000 record.  Elapsed time: 00:11:30s.  Time for last 500,000:  689s.  Last read position: chr6:157,416,666
[2017/10/13 14:16:07 | CallMolecularConsensusReads | Info] processed     1,000,000 record.  Elapsed time: 00:25:55s.  Time for last 500,000:  865s.  Last read position: chr6:157,416,666
[2017/10/13 14:43:37 | FgBioMain | Info] GroupReadsByUmi failed. Elapsed time: 54.03 minutes.
Exception in thread "main" java.lang.StackOverflowError
        at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:135)
        at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:121)
        at scala.collection.GenSeqLike.$anonfun$indexOf$1(GenSeqLike.scala:146)
        at scala.collection.GenSeqLike.$anonfun$indexOf$1$adapted(GenSeqLike.scala:146)
        at scala.collection.IndexedSeqOptimized.$anonfun$indexWhere$1(IndexedSeqOptimized.scala:203)
        at scala.collection.IndexedSeqOptimized.$anonfun$indexWhere$1$adapted(IndexedSeqOptimized.scala:203)
        at scala.collection.IndexedSeqOptimized.segmentLength(IndexedSeqOptimized.scala:194)
        at scala.collection.IndexedSeqOptimized.segmentLength$(IndexedSeqOptimized.scala:191)
        at scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:48)
        at scala.collection.IndexedSeqOptimized.indexWhere(IndexedSeqOptimized.scala:203)
        at scala.collection.IndexedSeqOptimized.indexWhere$(IndexedSeqOptimized.scala:201)
        at scala.collection.mutable.ArrayBuffer.indexWhere(ArrayBuffer.scala:48)
        at scala.collection.GenSeqLike.indexOf(GenSeqLike.scala:146)
        at scala.collection.GenSeqLike.indexOf$(GenSeqLike.scala:146)
        at scala.collection.AbstractSeq.indexOf(Seq.scala:41)
        at scala.collection.GenSeqLike.indexOf(GenSeqLike.scala:130)
        at scala.collection.GenSeqLike.indexOf$(GenSeqLike.scala:130)
        at scala.collection.AbstractSeq.indexOf(Seq.scala:41)
        at scala.collection.mutable.BufferLike.$minus$eq(BufferLike.scala:130)
        at scala.collection.mutable.BufferLike.$minus$eq$(BufferLike.scala:129)
        at scala.collection.mutable.AbstractBuffer.$minus$eq(Buffer.scala:49)
        at scala.collection.mutable.ArrayBuffer.$minus$eq(ArrayBuffer.scala:48)
        at scala.collection.generic.Shrinkable.$anonfun$$minus$minus$eq$1(Shrinkable.scala:49)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at scala.collection.generic.Shrinkable.$minus$minus$eq(Shrinkable.scala:49)
        at scala.collection.generic.Shrinkable.$minus$minus$eq$(Shrinkable.scala:49)
        at scala.collection.mutable.AbstractBuffer.$minus$minus$eq(Buffer.scala:49)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.addChildren$1(GroupReadsByUmi.scala:212)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.$anonfun$assign$11(GroupReadsByUmi.scala:213)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.$anonfun$assign$11$adapted(GroupReadsByUmi.scala:213)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.addChildren$1(GroupReadsByUmi.scala:213)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.$anonfun$assign$11(GroupReadsByUmi.scala:213)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$AdjacencyUmiAssigner.$anonfun$assign$11$adapted(GroupReadsByUmi.scala:213)
        [....]
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
Exception in thread "pool-6-thread-1" java.lang.NullPointerException
        at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
        at htsjdk.samtools.util.AsyncBlockCompressedInputStream$AsyncBlockCompressedInputStreamRunnable.run(AsyncBlockCompressedInputStream.java:225)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
chapmanbcommented, Oct 20, 2017

Tim and Nils – thanks so much for this. I tested your branch on the original dataset and it works great. It can power through the excessive number of nearly identical UMIs and no longer overflows.

In this case, I think the original sample itself is probably not going to have a lot of useful data, and I’m writing it off as more of a failed sample then anything. However, it’s really nice to be able to get through the analysis so the research folks can at least take a look.

Thanks again for all the help and looking at this. Much appreciated.

1reaction
tfennecommented, Oct 17, 2017

@chapmanb I took a brief look at this too. I think you’re right on the problem. I added a little debug logging, and there are a bunch of pair-locations where there are a ton of read pairs. I think the real problem is the pairs at chr5:157416598-157417071, of which there are 260,960. Furthermore, I think there are a lot of UMIs there with errors…

The adjacency strategy tries to build a tree/graph relating UMIs to each other by counts and by edit distance, and then traverse that tree. Traversing that tree is where the StackOverflowException is happening, implying that the tree got really deep, implying that there are long chains of UMIs separated by a single edit each. I’m real curious what kind of data this is and whether it’s expected.

I’ve made some changes on a branch (https://github.com/fulcrumgenomics/fgbio/tree/tf_group_reads_stack_overflow) that should, I think, fix this by using a stack directly instead of recursion. Can you give it a shot?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix java.lang.StackOverflowError in Java
StackOverflowError indicates that the application stack is exhausted and is usually caused by deep or infinite recursion.
Read more >
java - What actually causes a Stack Overflow error?
lang.StackOverflowError error under the following circumstance: Thrown when a stack overflow occurs because an application recurses too deeply.
Read more >
GroupReadsByUmi | fgbio - Fulcrum Genomics
Grouping of UMIs is performed by one of four strategies: identity: only reads with identical UMI sequences are grouped together. This strategy may...
Read more >
The StackOverflowError in Java
In this article, we'll see how this error can occur by looking at a variety of code examples as well as how we...
Read more >
StackOverflowError (Java Platform SE 8 )
public class StackOverflowError extends VirtualMachineError. Thrown when a stack overflow occurs because an application recurses too deeply.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found