Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LoggingHandler hangs the thread

See original GitHub issue

Our application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty image and used com.google.cloud.logging.LoggingHandler to publish logs on Stackdriver. We noticed some worker threads becoming unresponsive over time. When the pod is shutting down we can see the following exception for each:

java.lang.RuntimeException: java.lang.InterruptedException
	at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:545)
	at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:525)
	at com.google.cloud.logging.LoggingHandler.publish(LoggingHandler.java:273)
	at java.util.logging.Logger.log(Logger.java:738)
	at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:582)
	at org.slf4j.impl.JDK14LoggerAdapter.error(JDK14LoggerAdapter.java:500)
        ...
Caused by: java.lang.InterruptedException
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:449)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
	at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
	at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:543)
	... 30 more

We’ll try to extract a thread dump to see why the future never completes, but the issue seems dangerous by itself: LoggingImpl.java:543 uses the non-timeout version of Future.get() which can cause any logger call to block the current thread forever unless interrupted. Would it be possible to use the timeout version with a reasonably big timeout, e.g. 60 seconds?

Issue Analytics

State:
Created 6 years ago
Comments:21 (7 by maintainers)

Top GitHub Comments

2reactions

sergei-ivanovcommented, Oct 19, 2018

Hi,

Having enabled Stackdriver logging (via spring-cloud-gcp-starter-logging and gcloud logback appender) about a week ago, we are facing recurring issues there.

First of all, it appears that whenever LoggingImpl has to flush, it is a blocking synchronous operation from the caller’s perspective. I initially had an assumption that writeSynchronicity (which is set to ASYNC by default for logback) would ensure that all underlying API calls are offloaded to some background processing thread. It does not seem to be the case.

It would be great if all asynchronous and blocking operations (and configuration options, when available/applicable) were properly documented. Otherwise it involves a lot of guesswork on the part of library users.

So in order to mitigate sporadic delays caused by gcloud logging appender, we decided to wrap it inside an AsyncAppender. Unfortunately, it made matters worse. At some point the cloud backend failed with UNAVAILABLE: Authentication backend unavailable, which completely locked up the application’s processing thread. I’ve done a thread dump (at the bottom of this message), and it looks like we effectively have a deadlock there.

The application thread is waiting on async appender, which has its buffer full and is blocked
The working thread of async appender is waiting for the gcloud logger to flush
The gRPC thread failed with an exception and tried to log that exception, which also resulted in it being blocked on async appender call!

So the application is in a state where a recovery is impossible.

My next step in trying to mitigate this will be increasing the queue size for AsyncAppender and enabling the neverBlock option on it. Which means some messages will be lost when the gcloud backend chokes up, but at least the application won’t be deadlocked by that:

<configuration>

    <springProperty name="APP_NAME" source="spring.application.name" default="spring"/>

    <appender name="CLOUD" class="com.google.cloud.logging.logback.LoggingAppender">
        <flushLevel>WARN</flushLevel>
        <log>${APP_NAME}</log>
    </appender>
    <appender name="ASYNC_CLOUD" class="ch.qos.logback.classic.AsyncAppender">
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>INFO</level>
        </filter>
        <!-- May result in message loss when cloud backend is unavailable, but won't block the application -->
        <neverBlock>true</neverBlock>
        <queueSize>2048</queueSize>
        <appender-ref ref="CLOUD"/>
    </appender>

    <root level="DEBUG">
        <appender-ref ref="ASYNC_CLOUD"/>
        <!-- Other appenders here... -->
    </root>

</configuration>

Again, maybe it’s worth updating the documentation (both the project’s readme file, and the stackdriver docs) to include the recommended AsyncAppender settings.

And here comes a thread dump:

"http-nio-8080-exec-1" #58 daemon prio=5 os_prio=0 tid=0x00007f388ff7e000 nid=0x754d waiting on condition [0x00007f3834
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000733d7e4b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
        at ch.qos.logback.core.AsyncAppenderBase.putUninterruptibly(AsyncAppenderBase.java:181)
        at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:172)
        at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:161)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.warn(Logger.java:688)
        at <application code>

"AsyncAppender-Worker-ASYNC_CLOUD" #15 daemon prio=5 os_prio=0 tid=0x00007f388cd28000 nid=0x7512 waiting on condition [
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000079bba6b18> (a com.google.common.util.concurrent.CollectionFuture$ListFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:471)
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:78)
        at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
        at com.google.cloud.logging.LoggingImpl.flush(LoggingImpl.java:544)
        at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:526)
        at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:201)
        at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:63)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:290)
   Locked ownable synchronizers:
        - None

"Gax-2" #16 daemon prio=5 os_prio=0 tid=0x00007f3830410000 nid=0x7522 waiting on condition [0x00007f3870f53000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000733d7e4b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionO
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
        at ch.qos.logback.core.AsyncAppenderBase.putUninterruptibly(AsyncAppenderBase.java:181)
        at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:172)
        at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:161)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.log(Logger.java:765)
        at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
        at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
        at java.util.logging.Logger.log(Logger.java:738)
        at java.util.logging.Logger.doLog(Logger.java:765)
        at java.util.logging.Logger.log(Logger.java:876)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:907)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:112)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:134)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:95)
        at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:77)
        at com.google.api.gax.rpc.BatchedFuture.setException(BatchedFuture.java:55)
        at com.google.api.gax.rpc.BatchedRequestIssuer.sendResult(BatchedRequestIssuer.java:84)
        at com.google.api.gax.rpc.BatchExecutor$1.onFailure(BatchExecutor.java:96)
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1126)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:95)
        at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:77)
        at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.ja
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1126)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:677)
        at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:493)
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:468)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener
        at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener
        at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers:
        - <0x00000007343591f8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

1reaction

pongadcommented, Jan 22, 2018

This seems related to #1795. The underlying client provides no way to initiate a flush.

@sparhomenko Do you have a sense of how long we’re waiting? The above bug might cause us to wait a few seconds, but it shouldn’t cause a prolonged hang. I agree that the wait with no timeout is a problem, but I want to understand the problem better before sending fixes.

Top Results From Across the Web

Developers - LoggingHandler hangs the thread - - Bountysource

Our application is running on Google Kubernetes Engine using gcr.io/google-appengine/jetty image and used com.google.cloud.logging.

Tkinter python crashes on new thread trying to log on main ...

As I said in comments - your error occures because you passing your text widget alongside with instance of TextHandler to a separate...

logging regression with threading + fork are mixed in 3.7.1rc2 ...

I am solving a regression in CPython behavior between 3.7.0 and 3.7.1 that led to a logging.Handler lock related deadlock in two identified ......

Logging Problems in IBM WebSphere Application Server

This document contains troubleshooting information for logging problems in the WebSphere® Application Server. This can help address common ...

Spring Integration

This will cause hung threads and jstack <pid> might present a result such as: ... an example of configuring the LoggingHandler by using...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

LoggingHandler hangs the thread

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

ServiceOptions.getDefaultProjectId() in Cloud Shell picks up "no-project-id" from metadata server

[Firestore] Permission denials different when using `GOOGLE_APPLICATION_CREDENTIALS` versus setCredentials