Prometheus async worker thread crashes after upgrade
See original GitHub issueSteps to reproduce
Upgraded jenkins with previously working prometheus metrics plugin. Using the official Jenkins docker image / alpine. Upgraded jenkins core from 2.176.2 to 2.176.3, upgraded prometheus plugin from 2.0.0 to 2.0.6
On startup, and continuing periodically after, I see the following stack trace in my logs:
Sep 18, 2019 5:11:26 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started prometheus_async_worker
Sep 18, 2019 5:11:26 PM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (prometheus_async_worker thread/194) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.StackOverflowError
at java.util.TreeMap.put(TreeMap.java:568)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:44)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
at org.jenkinsci.plugins.prometheus.util.FlowNodes.traverseTree(FlowNodes.java:49)
...
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:22 (4 by maintainers)
Top Results From Across the Web
Collection of alerting rules - Awesome Prometheus alerts
Alert thresholds depend on nature of applications. ... An exporter might be crashed. [copy] ... Users may be seeing delays in background processing....
Read more >New Relic 5.7.1 crashes in swizzling method while ...
New Relic 5.7.1 crashes in swizzling method while HockeyApp is sending data on a background thread ... This crash was reported with HockeyApp,...
Read more >Troubleshooting Sidekiq - GitLab Docs
Sidekiq is the background job processor GitLab uses to asynchronously run tasks. When things go wrong it can be difficult to troubleshoot.
Read more >Switching back to the UI thread in WPF/UWP, in modern C#
In WinForms/WPF/UWP, you can only update controls from the UI thread. If you have code running in a background thread that needs to...
Read more >Prometheus plugin broke with upgrade to 2.0.8 - Jenkins Jira
A thread (prometheus_async_worker thread/29837) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My first guess here is that the issue here is that the traversal of the FlowNodes is using a recursive algorithm. By moving it to an iterative algorithm (which is hard to derive from first principles but fortunately we have Google), my guess is one of two things will happen:
The traversal runs out of stack space simply because companies with larger Jenkins deployments have more FlowNodes than there is stack memory, and moving it to iterative moves the memory to the heap instead of the call stack, which will completely resolve the issue.
The plugin will go into a truly infinite recursion, at which point we can then figure out why this is happening. Given that FlowNodes are a Jenkins construct and not one of this plugin, I’d be surprised if the tree is constructed badly.
I concede I don’t know exactly what FlowNodes or why this broke from 2.0.0 -> 2.0.6, but that’s my first guess.
I just ran into the same issue on Jenkins 2.199 and Prometheus 2.0.6. Was forced to downgrade the plugin to 2.0.0 as I would wind up with one cpu pegged and jenkins would no longer service requests