question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

creat ledger timeout and thread BookKeeperClientWorker in stuck

See original GitHub issue

Describe the bug we found many timeout logs when creating producer or sending message.like this,we creat a new topic and then build a producer to send message:

java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout 
        at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]

we check the stack trace of create ledger and found that when creating ledger complete it can’t get the thread BookKeeperClientWorker to do callback’s work

image

we dump tread stack, BookKeeperClientWorker-OrderedExecutor-0-0 is always BLOCKED

"BookKeeperClientScheduler-OrderedScheduler-0-0" #26 prio=5 os_prio=0 cpu=61864.97ms elapsed=268089.10s tid=0x00007f88598ef800 nid=0x17b in Object.wait()  [0x00007f87fa5e1000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(java.base@11.0.12/Native Method)
        - waiting on <no object reference available>
        at java.lang.Object.wait(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:275)
        - waiting to re-lock in wait() <0x00000000cb13f9c0> (a io.netty.channel.DefaultChannelPromise)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.closeInternal(PerChannelBookieClient.java:1081)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.disconnect(PerChannelBookieClient.java:1034)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.disconnect(PerChannelBookieClient.java:1029)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.failTLS(PerChannelBookieClient.java:2555)
        - locked <0x00000000b8293218> (a org.apache.bookkeeper.proto.PerChannelBookieClient)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.access$2900(PerChannelBookieClient.java:154)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$StartTLSCompletion.errorOut(PerChannelBookieClient.java:1961)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue.timeout(PerChannelBookieClient.java:1614)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue.maybeTimeout(PerChannelBookieClient.java:1606)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.lambda$static$3(PerChannelBookieClient.java:1011)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$$Lambda$596/0x00000001006f8440.test(Unknown Source)
        at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap$Section.removeIf(ConcurrentOpenHashMap.java:411)
        at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap.removeIf(ConcurrentOpenHashMap.java:172)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.checkTimeoutOnPendingOperations(PerChannelBookieClient.java:1015)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.checkTimeoutOnPendingOperations(DefaultPerChannelBookieClientPool.java:132)
        at org.apache.bookkeeper.proto.BookieClientImpl.monitorPendingOperations(BookieClientImpl.java:572)
        at org.apache.bookkeeper.proto.BookieClientImpl.lambda$new$0(BookieClientImpl.java:131)
        at org.apache.bookkeeper.proto.BookieClientImpl$$Lambda$163/0x000000010038e840.run(Unknown Source)
        at org.apache.bookkeeper.util.SafeRunnable$1.safeRun(SafeRunnable.java:43)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:705)
        at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.12/Unknown Source)

"BookKeeperClientWorker-OrderedExecutor-0-0" #27 prio=5 os_prio=0 cpu=616668.69ms elapsed=268089.10s tid=0x00007f88598f3000 nid=0x17c waiting for monitor entry  [0x00007f87fa4e0000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:631)
        - waiting to lock <0x00000000b8293218> (a org.apache.bookkeeper.proto.PerChannelBookieClient)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121)
        at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116)
        at org.apache.bookkeeper.proto.BookieClientImpl.addEntry(BookieClientImpl.java:329)
        at org.apache.bookkeeper.client.PendingAddOp.sendWriteRequest(PendingAddOp.java:152)
        at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:278)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/Unknown Source)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.12/Unknown Source)

To Reproduce i don’t known how to reproduce

Additional context version:2.9.1

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fistan684commented, Feb 18, 2022

Since upgrading from 2.7.1 to 2.9.1 we also have been hitting issues where our producers will begin failing. Typically we’ll have one broker that isn’t responding to requests and restarting it will fix the issue. From looking at the broker logs we see a similar error to what is describe in this issue:

2022-02-18T17:35:06,772+0000 [pulsar-io-4-7] WARN org.apache.pulsar.broker.service.ServerCnx - [/10.100.209.43:51632][persistent://prod/voltron-general/871_2b2d84be150dcf9c_MAID_DELETE_6333758_4bb66664126194f7-partition-0][voltron] Failed to create consumer: consumerId=0, Failed to load topic within timeout java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout ... at org.apache.pulsar.common.util.FutureUtil.lambda$addTimeoutHandling$1(FutureUtil.java:141) ~[org.apache.pulsar-pulsar-common-2.9.1.jar:2.9.1] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final] ... Caused by: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout

This seems to be happening randomly every 4-7 hours since we upgraded. We typically write to a lot of topics in a given namespace. We’ll try to capture thread state next time it happens.

0reactions
github-actions[bot]commented, May 28, 2022

The issue had no activity for 30 days, mark with Stale label.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pulsar configuration
Name Description Default exposePublisherStats Whether to enable topic level metrics. true statsUpdateFrequencyInSecs 60 statsUpdateInitialDelayInSecs 60
Read more >
[bookkeeper] 05/11: Add auditor get ledger throttle to avoid auto ...
This is an automated email from the ASF dual-hosted git repository. yong pushed a commit to branch branch-4.14 in repository ...
Read more >
Client Optimization: How Tencent Maintains Apache Pulsar ...
The pulsar-io thread gets stuck; Excessive time consumption in ledger switching; Busy bookkeeper-io thread; Debug logging impact; Uneven ...
Read more >
NetSuite Applications Suite - Journal Entries
NetSuite enforces double-entry bookkeeping. Therefore, journal entries post changes to accounts using offsetting debits and credits.
Read more >
org.apache.bookkeeper.benchmark.BenchThroughputLatency.java ...
Here is the source code for org.apache.bookkeeper.benchmark. ... LedgerHandle; import org.apache.bookkeeper.client. ... nanoTime(); while (!Thread.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found