LoggingHandler hangs the thread
See original GitHub issueOur application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty
image and used com.google.cloud.logging.LoggingHandler
to publish logs on Stackdriver. We noticed some worker threads becoming unresponsive over time. When the pod is shutting down we can see the following exception for each:
java.lang.RuntimeException: java.lang.InterruptedException
at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:545)
at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:525)
at com.google.cloud.logging.LoggingHandler.publish(LoggingHandler.java:273)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:582)
at org.slf4j.impl.JDK14LoggerAdapter.error(JDK14LoggerAdapter.java:500)
...
Caused by: java.lang.InterruptedException
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:449)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:543)
... 30 more
We’ll try to extract a thread dump to see why the future never completes, but the issue seems dangerous by itself: LoggingImpl.java:543
uses the non-timeout version of Future.get()
which can cause any logger call to block the current thread forever unless interrupted. Would it be possible to use the timeout version with a reasonably big timeout, e.g. 60 seconds?
Issue Analytics
- State:
- Created 6 years ago
- Comments:21 (7 by maintainers)
Top Results From Across the Web
Developers - LoggingHandler hangs the thread - - Bountysource
Our application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty image and used com.google.cloud.logging.
Read more >Tkinter python crashes on new thread trying to log on main ...
As I said in comments - your error occures because you passing your text widget alongside with instance of TextHandler to a separate...
Read more >logging regression with threading + fork are mixed in 3.7.1rc2 ...
I am solving a regression in CPython behavior between 3.7.0 and 3.7.1 that led to a logging.Handler lock related deadlock in two identified ......
Read more >Logging Problems in IBM WebSphere Application Server
This document contains troubleshooting information for logging problems in the WebSphere® Application Server. This can help address common ...
Read more >Spring Integration
This will cause hung threads and jstack <pid> might present a result such as: ... an example of configuring the LoggingHandler by using...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi,
Having enabled Stackdriver logging (via
spring-cloud-gcp-starter-logging
and gcloud logback appender) about a week ago, we are facing recurring issues there.First of all, it appears that whenever
LoggingImpl
has to flush, it is a blocking synchronous operation from the caller’s perspective. I initially had an assumption thatwriteSynchronicity
(which is set toASYNC
by default for logback) would ensure that all underlying API calls are offloaded to some background processing thread. It does not seem to be the case.It would be great if all asynchronous and blocking operations (and configuration options, when available/applicable) were properly documented. Otherwise it involves a lot of guesswork on the part of library users.
So in order to mitigate sporadic delays caused by gcloud logging appender, we decided to wrap it inside an
AsyncAppender
. Unfortunately, it made matters worse. At some point the cloud backend failed with UNAVAILABLE: Authentication backend unavailable, which completely locked up the application’s processing thread. I’ve done a thread dump (at the bottom of this message), and it looks like we effectively have a deadlock there.So the application is in a state where a recovery is impossible.
My next step in trying to mitigate this will be increasing the queue size for
AsyncAppender
and enabling the neverBlock option on it. Which means some messages will be lost when the gcloud backend chokes up, but at least the application won’t be deadlocked by that:Again, maybe it’s worth updating the documentation (both the project’s readme file, and the stackdriver docs) to include the recommended
AsyncAppender
settings.And here comes a thread dump:
This seems related to #1795. The underlying client provides no way to initiate a flush.
@sparhomenko Do you have a sense of how long we’re waiting? The above bug might cause us to wait a few seconds, but it shouldn’t cause a prolonged hang. I agree that the wait with no timeout is a problem, but I want to understand the problem better before sending fixes.