Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Topic Operator failing to start with io.vertx.core.VertxException: Thread blocked

See original GitHub issue

Describe the bug When deploying a very simple cluster with the topicOperator enabled, the topicOperator container fails to start. The logs for the container report a blocked thread. The k8s liveness check eventually kills the container.

2021-12-16 00:16:50,79115 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2542 ms, time limit is 2000 ms
2021-12-16 00:16:51,79090 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3542 ms, time limit is 2000 ms
2021-12-16 00:16:52,79034 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4541 ms, time limit is 2000 ms
2021-12-16 00:16:53,79105 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5542 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
	at jdk.internal.misc.Unsafe.park(Native Method) ~[?:?]
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) ~[?:?]
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1796) ~[?:?]
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128) ~[?:?]
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1823) ~[?:?]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1998) ~[?:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:35) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:27) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.result(ConcurrentUtil.java:54) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:198) ~[io.strimzi.topic-operator-0.26.0.jar:0.26.0]
	at io.strimzi.operator.topic.Session$$Lambda$278/0x0000000840319840.handle(Unknown Source) ~[?:?]
	at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase$$Lambda$293/0x000000084031e040.run(Unknown Source) ~[?:?]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[io.netty.netty-transport-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]

To Reproduce Steps to reproduce the behavior:

Install Strimzi Operator using the 0.26.0 helm chart
Create a Cluster manifest:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-basic
spec:
  kafka:
    version: 3.0.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    storage:
      type: ephemeral
  zookeeper:   
    replicas: 1
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}

Apply the manifest with kubectl apply -f kafka-basic.yaml
Watch the topic operator logs with kubectl logs deploy/kafka-basic-entity-operator -c topic-operator

Expected behavior The topic operator starts correctly.

Environment:

Strimzi version: 0.26.0
Installation method: Helm chart
Kubernetes cluster: Kubernetes 1.20.7
Infrastructure: Amazon EKS

YAML files and logs Thanks for the handy script! report-16-12-2021_11-26-59.zip

Additional context Similar errors show up in these issues: https://github.com/strimzi/strimzi-kafka-operator/issues/383 https://github.com/strimzi/strimzi-kafka-operator/issues/1050 https://github.com/strimzi/strimzi-kafka-operator/issues/4964

Increasing the resource claims for the topic operator didn’t change the behaviour.

Zookeeper doesn’t show any errors or timeouts.

Issue Analytics

State:
Created 2 years ago
Comments:23 (9 by maintainers)

Top GitHub Comments

7reactions

danlenarcommented, Jan 11, 2022

Also running into this.

For the time being, I am defaulting back to zookeeper store instead of kafka streams store by doing the following

  entityOperator:
    template:
      topicOperatorContainer:
        env:
        - name: STRIMZI_USE_ZOOKEEPER_TOPIC_STORE
          value: "true"

2reactions

tombentleycommented, Feb 11, 2022

Using ZK for now is fine, but as you note ZK will eventually disappear. So I guess overriding is fine in the short term.

Top Results From Across the Web

Unable to install standalone user operator and topic operator.

When we are trying to install standalone topic or user operator, ... has been blocked for 5298 ms, time limit is 2000 ms...

Vertx Thread Blocked Deep Dive Analysis - Kelvin

The vertx stuck. As shown in the logs, the vertx event loop thread was blocked as it took too much time (the limit...

Blocked thread while using Hazelcast - vert.x - Stack Overflow

I tried to set the Djava.util.concurrent.ForkJoinPool.common.parallelism=1, change Hazelcast version to 4.2.5, increase CPU quota but it makes ...

WorkerVerticle and Thread blocked VertxException

vertx newb here. I have a while loop that asks postgres for notifications. So I created a WorkerVerticle and placed the loop code...

Thread blocked issue with reactive API's - Java

BlockedThreadChecker - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 1729768 ms, time limit is 2000 ms io.vertx.core.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Topic Operator failing to start with io.vertx.core.VertxException: Thread blocked

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Pod could not be rotated due to under-replicated partition

Log4j2 CVE (CVE-2021-4104) JMSAppender for kafka and zookeeper