question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LoggingHandler hangs the thread

See original GitHub issue

Our application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty image and used com.google.cloud.logging.LoggingHandler to publish logs on Stackdriver. We noticed some worker threads becoming unresponsive over time. When the pod is shutting down we can see the following exception for each:

java.lang.RuntimeException: java.lang.InterruptedException
	at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:545)
	at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:525)
	at com.google.cloud.logging.LoggingHandler.publish(LoggingHandler.java:273)
	at java.util.logging.Logger.log(Logger.java:738)
	at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:582)
	at org.slf4j.impl.JDK14LoggerAdapter.error(JDK14LoggerAdapter.java:500)
        ...
Caused by: java.lang.InterruptedException
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:449)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
	at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
	at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:543)
	... 30 more

We’ll try to extract a thread dump to see why the future never completes, but the issue seems dangerous by itself: LoggingImpl.java:543 uses the non-timeout version of Future.get() which can cause any logger call to block the current thread forever unless interrupted. Would it be possible to use the timeout version with a reasonably big timeout, e.g. 60 seconds?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:21 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
sergei-ivanovcommented, Oct 19, 2018

Hi,

Having enabled Stackdriver logging (via spring-cloud-gcp-starter-logging and gcloud logback appender) about a week ago, we are facing recurring issues there.

First of all, it appears that whenever LoggingImpl has to flush, it is a blocking synchronous operation from the caller’s perspective. I initially had an assumption that writeSynchronicity (which is set to ASYNC by default for logback) would ensure that all underlying API calls are offloaded to some background processing thread. It does not seem to be the case.

It would be great if all asynchronous and blocking operations (and configuration options, when available/applicable) were properly documented. Otherwise it involves a lot of guesswork on the part of library users.

So in order to mitigate sporadic delays caused by gcloud logging appender, we decided to wrap it inside an AsyncAppender. Unfortunately, it made matters worse. At some point the cloud backend failed with UNAVAILABLE: Authentication backend unavailable, which completely locked up the application’s processing thread. I’ve done a thread dump (at the bottom of this message), and it looks like we effectively have a deadlock there.

  • The application thread is waiting on async appender, which has its buffer full and is blocked
  • The working thread of async appender is waiting for the gcloud logger to flush
  • The gRPC thread failed with an exception and tried to log that exception, which also resulted in it being blocked on async appender call!

So the application is in a state where a recovery is impossible.

My next step in trying to mitigate this will be increasing the queue size for AsyncAppender and enabling the neverBlock option on it. Which means some messages will be lost when the gcloud backend chokes up, but at least the application won’t be deadlocked by that:

<configuration>

    <springProperty name="APP_NAME" source="spring.application.name" default="spring"/>

    <appender name="CLOUD" class="com.google.cloud.logging.logback.LoggingAppender">
        <flushLevel>WARN</flushLevel>
        <log>${APP_NAME}</log>
    </appender>
    <appender name="ASYNC_CLOUD" class="ch.qos.logback.classic.AsyncAppender">
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>INFO</level>
        </filter>
        <!-- May result in message loss when cloud backend is unavailable, but won't block the application -->
        <neverBlock>true</neverBlock>
        <queueSize>2048</queueSize>
        <appender-ref ref="CLOUD"/>
    </appender>

    <root level="DEBUG">
        <appender-ref ref="ASYNC_CLOUD"/>
        <!-- Other appenders here... -->
    </root>

</configuration>

Again, maybe it’s worth updating the documentation (both the project’s readme file, and the stackdriver docs) to include the recommended AsyncAppender settings.

And here comes a thread dump:

"http-nio-8080-exec-1" #58 daemon prio=5 os_prio=0 tid=0x00007f388ff7e000 nid=0x754d waiting on condition [0x00007f3834
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000733d7e4b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
        at ch.qos.logback.core.AsyncAppenderBase.putUninterruptibly(AsyncAppenderBase.java:181)
        at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:172)
        at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:161)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.warn(Logger.java:688)
        at <application code>

"AsyncAppender-Worker-ASYNC_CLOUD" #15 daemon prio=5 os_prio=0 tid=0x00007f388cd28000 nid=0x7512 waiting on condition [
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000079bba6b18> (a com.google.common.util.concurrent.CollectionFuture$ListFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:471)
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:78)
        at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
        at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:544)
        at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:526)
        at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:201)
        at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:63)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:290)
   Locked ownable synchronizers:
        - None

"Gax-2" #16 daemon prio=5 os_prio=0 tid=0x00007f3830410000 nid=0x7522 waiting on condition [0x00007f3870f53000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000733d7e4b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
        at ch.qos.logback.core.AsyncAppenderBase.putUninterruptibly(AsyncAppenderBase.java:181)
        at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:172)
        at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:161)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.log(Logger.java:765)
        at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
        at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
        at java.util.logging.Logger.log(Logger.java:738)
        at java.util.logging.Logger.doLog(Logger.java:765)
        at java.util.logging.Logger.log(Logger.java:876)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:907)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:112)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:134)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:95)
        at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:77)
        at com.google.api.gax.rpc.BatchedFuture.setException(BatchedFuture.java:55)
        at com.google.api.gax.rpc.BatchedRequestIssuer.sendResult(BatchedRequestIssuer.java:84)
        at com.google.api.gax.rpc.BatchExecutor$1.onFailure(BatchExecutor.java:96)
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1126)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:95)
        at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:77)
        at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.ja
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1126)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:493)
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:468)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener
        at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener
        at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers:
        - <0x00000007343591f8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
1reaction
pongadcommented, Jan 22, 2018

This seems related to #1795. The underlying client provides no way to initiate a flush.

@sparhomenko Do you have a sense of how long we’re waiting? The above bug might cause us to wait a few seconds, but it shouldn’t cause a prolonged hang. I agree that the wait with no timeout is a problem, but I want to understand the problem better before sending fixes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - LoggingHandler hangs the thread - - Bountysource
Our application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty image and used com.google.cloud.logging.
Read more >
Tkinter python crashes on new thread trying to log on main ...
As I said in comments - your error occures because you passing your text widget alongside with instance of TextHandler to a separate...
Read more >
logging regression with threading + fork are mixed in 3.7.1rc2 ...
I am solving a regression in CPython behavior between 3.7.0 and 3.7.1 that led to a logging.Handler lock related deadlock in two identified ......
Read more >
Logging Problems in IBM WebSphere Application Server
This document contains troubleshooting information for logging problems in the WebSphere® Application Server. This can help address common ...
Read more >
Spring Integration
This will cause hung threads and jstack <pid> might present a result such as: ... an example of configuring the LoggingHandler by using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found