Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

creat ledger timeout and thread BookKeeperClientWorker in stuck

See original GitHub issue

Describe the bug we found many timeout logs when creating producer or sending message.like this,we creat a new topic and then build a producer to send message:

java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout 
        at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]

we check the stack trace of create ledger and found that when creating ledger complete it can’t get the thread BookKeeperClientWorker to do callback’s work

we dump tread stack, BookKeeperClientWorker-OrderedExecutor-0-0 is always BLOCKED

"BookKeeperClientScheduler-OrderedScheduler-0-0" #26 prio=5 os_prio=0 cpu=61864.97ms elapsed=268089.10s tid=0x00007f88598ef800 nid=0x17b in Object.wait()  [0x00007f87fa5e1000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(java.base@11.0.12/Native Method)
        - waiting on <no object reference available>
        at java.lang.Object.wait(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:275)
        - waiting to re-lock in wait() <0x00000000cb13f9c0> (a io.netty.channel.DefaultChannelPromise)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.closeInternal(PerChannelBookieClient.java:1081)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.disconnect(PerChannelBookieClient.java:1034)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.disconnect(PerChannelBookieClient.java:1029)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.failTLS(PerChannelBookieClient.java:2555)
        - locked <0x00000000b8293218> (a org.apache.bookkeeper.proto.PerChannelBookieClient)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.access$2900(PerChannelBookieClient.java:154)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$StartTLSCompletion.errorOut(PerChannelBookieClient.java:1961)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue.timeout(PerChannelBookieClient.java:1614)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue.maybeTimeout(PerChannelBookieClient.java:1606)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.lambda$static$3(PerChannelBookieClient.java:1011)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$$Lambda$596/0x00000001006f8440.test(Unknown Source)
        at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap$Section.removeIf(ConcurrentOpenHashMap.java:411)
        at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap.removeIf(ConcurrentOpenHashMap.java:172)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.checkTimeoutOnPendingOperations(PerChannelBookieClient.java:1015)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.checkTimeoutOnPendingOperations(DefaultPerChannelBookieClientPool.java:132)
        at org.apache.bookkeeper.proto.BookieClientImpl.monitorPendingOperations(BookieClientImpl.java:572)
        at org.apache.bookkeeper.proto.BookieClientImpl.lambda$new$0(BookieClientImpl.java:131)
        at org.apache.bookkeeper.proto.BookieClientImpl$$Lambda$163/0x000000010038e840.run(Unknown Source)
        at org.apache.bookkeeper.util.SafeRunnable$1.safeRun(SafeRunnable.java:43)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:705)
        at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.12/Unknown Source)

"BookKeeperClientWorker-OrderedExecutor-0-0" #27 prio=5 os_prio=0 cpu=616668.69ms elapsed=268089.10s tid=0x00007f88598f3000 nid=0x17c waiting for monitor entry  [0x00007f87fa4e0000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:631)
        - waiting to lock <0x00000000b8293218> (a org.apache.bookkeeper.proto.PerChannelBookieClient)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116)
        at org.apache.bookkeeper.proto.BookieClientImpl.addEntry(BookieClientImpl.java:329)
        at org.apache.bookkeeper.client.PendingAddOp.sendWriteRequest(PendingAddOp.java:152)
        at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:278)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.12/Unknown Source)

To Reproduce i don’t known how to reproduce

Additional context version：2.9.1

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

fistan684commented, Feb 18, 2022

Since upgrading from 2.7.1 to 2.9.1 we also have been hitting issues where our producers will begin failing. Typically we’ll have one broker that isn’t responding to requests and restarting it will fix the issue. From looking at the broker logs we see a similar error to what is describe in this issue:

2022-02-18T17:35:06,772+0000 [pulsar-io-4-7] WARN org.apache.pulsar.broker.service.ServerCnx - [/10.100.209.43:51632][persistent://prod/voltron-general/871_2b2d84be150dcf9c_MAID_DELETE_6333758_4bb66664126194f7-partition-0][voltron] Failed to create consumer: consumerId=0, Failed to load topic within timeout java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout ... at org.apache.pulsar.common.util.FutureUtil.lambda$addTimeoutHandling$1(FutureUtil.java:141) ~[org.apache.pulsar-pulsar-common-2.9.1.jar:2.9.1] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final] ... Caused by: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout

This seems to be happening randomly every 4-7 hours since we upgraded. We typically write to a lot of topics in a given namespace. We’ll try to capture thread state next time it happens.

0reactions

github-actions[bot]commented, May 28, 2022

The issue had no activity for 30 days, mark with Stale label.

Top Results From Across the Web

Pulsar configuration

Name Description Default exposePublisherStats Whether to enable topic level metrics. true statsUpdateFrequencyInSecs 60 statsUpdateInitialDelayInSecs 60

[bookkeeper] 05/11: Add auditor get ledger throttle to avoid auto ...

This is an automated email from the ASF dual-hosted git repository. yong pushed a commit to branch branch-4.14 in repository ...

Client Optimization: How Tencent Maintains Apache Pulsar ...

The pulsar-io thread gets stuck; Excessive time consumption in ledger switching; Busy bookkeeper-io thread; Debug logging impact; Uneven ...

NetSuite Applications Suite - Journal Entries

NetSuite enforces double-entry bookkeeping. Therefore, journal entries post changes to accounts using offsetting debits and credits.

org.apache.bookkeeper.benchmark.BenchThroughputLatency.java ...

Here is the source code for org.apache.bookkeeper.benchmark. ... LedgerHandle; import org.apache.bookkeeper.client. ... nanoTime(); while (!Thread.