question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Topic Operator failing to start with io.vertx.core.VertxException: Thread blocked

See original GitHub issue

Describe the bug When deploying a very simple cluster with the topicOperator enabled, the topicOperator container fails to start. The logs for the container report a blocked thread. The k8s liveness check eventually kills the container.

2021-12-16 00:16:50,79115 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2542 ms, time limit is 2000 ms
2021-12-16 00:16:51,79090 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3542 ms, time limit is 2000 ms
2021-12-16 00:16:52,79034 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4541 ms, time limit is 2000 ms
2021-12-16 00:16:53,79105 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5542 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
	at jdk.internal.misc.Unsafe.park(Native Method) ~[?:?]
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) ~[?:?]
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1796) ~[?:?]
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128) ~[?:?]
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1823) ~[?:?]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1998) ~[?:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:35) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:27) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.result(ConcurrentUtil.java:54) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:198) ~[io.strimzi.topic-operator-0.26.0.jar:0.26.0]
	at io.strimzi.operator.topic.Session$$Lambda$278/0x0000000840319840.handle(Unknown Source) ~[?:?]
	at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase$$Lambda$293/0x000000084031e040.run(Unknown Source) ~[?:?]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[io.netty.netty-transport-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]

To Reproduce Steps to reproduce the behavior:

  1. Install Strimzi Operator using the 0.26.0 helm chart
  2. Create a Cluster manifest:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-basic
spec:
  kafka:
    version: 3.0.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    storage:
      type: ephemeral
  zookeeper:   
    replicas: 1
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}
  1. Apply the manifest with kubectl apply -f kafka-basic.yaml
  2. Watch the topic operator logs with kubectl logs deploy/kafka-basic-entity-operator -c topic-operator

Expected behavior The topic operator starts correctly.

Environment:

  • Strimzi version: 0.26.0
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.20.7
  • Infrastructure: Amazon EKS

YAML files and logs Thanks for the handy script! report-16-12-2021_11-26-59.zip

Additional context Similar errors show up in these issues: https://github.com/strimzi/strimzi-kafka-operator/issues/383 https://github.com/strimzi/strimzi-kafka-operator/issues/1050 https://github.com/strimzi/strimzi-kafka-operator/issues/4964

Increasing the resource claims for the topic operator didn’t change the behaviour.

Zookeeper doesn’t show any errors or timeouts.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:23 (9 by maintainers)

github_iconTop GitHub Comments

7reactions
danlenarcommented, Jan 11, 2022

Also running into this.

For the time being, I am defaulting back to zookeeper store instead of kafka streams store by doing the following

  entityOperator:
    template:
      topicOperatorContainer:
        env:
        - name: STRIMZI_USE_ZOOKEEPER_TOPIC_STORE
          value: "true"
2reactions
tombentleycommented, Feb 11, 2022

Using ZK for now is fine, but as you note ZK will eventually disappear. So I guess overriding is fine in the short term.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to install standalone user operator and topic operator.
When we are trying to install standalone topic or user operator, ... has been blocked for 5298 ms, time limit is 2000 ms...
Read more >
Vertx Thread Blocked Deep Dive Analysis - Kelvin
The vertx stuck. As shown in the logs, the vertx event loop thread was blocked as it took too much time (the limit...
Read more >
Blocked thread while using Hazelcast - vert.x - Stack Overflow
I tried to set the Djava.util.concurrent.ForkJoinPool.common.parallelism=1, change Hazelcast version to 4.2.5, increase CPU quota but it makes ...
Read more >
WorkerVerticle and Thread blocked VertxException
vertx newb here. I have a while loop that asks postgres for notifications. So I created a WorkerVerticle and placed the loop code...
Read more >
Thread blocked issue with reactive API's - Java
BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 1729768 ms, time limit is 2000 ms io.vertx.core.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found