Issue when creating new topics with TopicOperator
See original GitHub issueDescribe the bug I am getting an issue where some of the topics are not getting created by the Topic Operator. I am creating around 40 topics (16 partitions, 3 replicas) and having a Kafka cluster of 3 brokers. The issue is intermittent as sometimes all the topics get created but other times some of them (5-10 topics) are not getting created. The Kubernetes custom resource (KafkaTopic) is there but the actual kafka topic is not available
To Reproduce Steps to reproduce the behavior:
- Create KafkaTopic custom resource
- Using helm loop through the KafkaTopic resource and create 40+ topics
- Check the status of KafkaTopic resources with NotReady state using kubectl command-
kubectl get kt -n kafka --context=testing -o json | jq -r '[.items[] |select(.status.conditions[0].type != "Ready")| .metadata.name]'
Expected behavior All the topics should be created without any issue.
Environment (please complete the following information):
- Strimzi version: 0.22.1
- Installation method: Helm chart
- Kubernetes cluster: Kubernetes 1.20
- Infrastructure: Amazon EKS
- Kafka version - 2.7.0
logs I am not getting any substantial information from the Topic Operator logs as well, this is the status of the topic which is in NotReady state-
Status:
Conditions:
Last Transition Time: 2021-11-25T06:07:27.477907Z
Message: Call(callName=createTopics, deadlineMs=1637820447475, tries=1, nextAllowedTryMs=1637820447577) timed out at 1637820447477 after 1 attempt()
Reason: TimeoutException
Status: True
Type: NotReady
Observed Generation: 1
Events:
Type Reason Age From Message
Warning <unknown> io.strimzi.operator.topic.TopicOperator Failure processing KafkaTopic watch event ADDED on resource <Topic-Name> with labels {app.kubernetes.io/managed-by=Helm, strimzi.io/cluster=kafka, tenant-id=<tenant-id>}: Call(callName=createTopics, deadlineMs=1637820447475, tries=1, nextAllowedTryMs=1637820447577) timed out at 1637820447477 after 1 attempt(s)
Additional context I have a couple of questions here- 1- We have around 80 test customers with 40 topics each (16 partitions, 3 replicas) which makes it around 150k partitions in the cluster, is that enough to be handled by a Kafka cluster of 3 brokers? 2- How can we start multiple instances of Entity Operator so that the Topic management load is distributed and we don’t end up in this kind of race condition?
This does seem related to #1775. Let me know if more details are required.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
I guess that could be as suggested by @sknot-rh => you might be reaching the limits of the system. Maybe increasing the resources for the Kafka cluster or for the Topic operator might help. But it is not exact science … so you would need to give it a try.
Ok, let me try increasing that and share the results.