Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pod could not be rotated due to under-replicated partition

See original GitHub issue

Describe the bug Unable to rotate pod on strimzi upgrade due to not expected under-replicated error.

To Reproduce

Upgrade From 0.25 to 0.26.1
Random error difficult to reproduce

Expected behavior Hello, When upgrading to version 0.26.1, we had several cases of pod rollout blocked due to under-replication error when it doesn’t seem to be expected according to the configurations in place. The problem occurred with random topics but also with the consumer-offsets topic.

Environment (please complete the following information):

Strimzi version: 0.26.1
Installation method: Helm chart
Kubernetes cluster: Kubernetes 1.20
Infrastructure: Amazon EKS

YAML files and logs

Topics Cluster conf:

    config:
      auto.create.topics.enable: 'false'
      num.partitions: 12
      default.replication.factor: 3
      min.insync.replicas: 1
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 1

Topics conf for one the topics where we had this issues

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: app-to-websocket
  namespace: kafka-applications
  labels:
    strimzi.io/cluster: kafka
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: "3600000" # 1H
    segment.ms: "300000"    # 5mn

and for __consumer_offsets (default one)

  partitions: 50
  replicas: 3

  2021-12-16 16:05:21 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$ForceableProblem: Pod kafka-metapro-kafka-0 is currently the controller and there are other pods still to roll, retrying after at least 250ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #69(timer) Kafka(kafka-customers-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #68(timer) Kafka(kafka-applications/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 500ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #71(timer) Kafka(kafka-systems-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 1000ms
2021-12-16 16:05:24 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:24 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 2000ms
2021-12-16 16:05:26 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:26 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 4000ms
2021-12-16 16:05:30 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:30 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 8000ms
2021-12-16 16:05:38 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:38 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 16000ms
2021-12-16 16:05:54 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be 2021-12-16 16:05:21 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$ForceableProblem: Pod kafka-metapro-kafka-0 is currently the controller and there are other pods still to roll, retrying after at least 250ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #69(timer) Kafka(kafka-customers-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #68(timer) Kafka(kafka-applications/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 500ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #71(timer) Kafka(kafka-systems-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 1000ms
2021-12-16 16:05:24 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:24 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 2000ms
2021-12-16 16:05:26 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:26 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 4000ms
2021-12-16 16:05:30 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:30 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 8000ms
2021-12-16 16:05:38 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:38 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 16000ms
2021-12-16 16:05:54 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:54 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 32000ms
2021-12-16 16:06:16 INFO  AbstractOperator:363 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Reconciliation is in progress
2021-12-16 16:06:27 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:06:27 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 64000ms (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:54 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 32000ms
2021-12-16 16:06:16 INFO  AbstractOperator:363 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Reconciliation is in progress
2021-12-16 16:06:27 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:06:27 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 64000ms


2021-12-16 16:07:31 INFO  KafkaRoller:292 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0, giving up after 10 attempts. Total delay between attempts 127750ms
io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:370) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$6(KafkaRoller.java:277) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2021-12-16 16:07:31 ERROR AbstractOperator:240 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): createOrUpdate failed
io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:370) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$6(KafkaRoller.java:277) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Additional context Migration from 0.25 to 0.26.1

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

otassetti-talendcommented, Jan 4, 2022

Hello, I’m putting a quick feedback to close the issue. We identified that the internal topics __strimzi-topic-operator-kstreams-topic-store-changelog replicas was set to 1, i think this settings come from a previous version (0.21->0.22 ?); so we set replicas to 3 and did repartitioning.

I don’t know if it was the root cause, but we didn’t identified any other sync issue on the migration we did after this change.

Thanks

1reaction

scholzjcommented, Dec 16, 2021

Well, you are right that I would not necessarily expect the replicas to not be in sync. All I meant was that the operator algorithm seems to work as intended here.

But I do now really know the cluster, so it is hard for me to speculate about the reasons. It could have been out of sync already before. Or it could be related to a recent restart - but from my experience, that usually syncs-up fairly quickly for the consumer offset topics. It could be also just a slow networking and so on. So it is hard to say about what is causing it.

Kafka logs might say more. I’m not really an expert on Kafka itself, so do not have any pointers to what exactly to look for. But maybe if you share the logs, others might have some idea.

Top Results From Across the Web

Fixing under replicated partitions in kafka - Stack Overflow

We faced the same issue: Solution was: Restart the Zookeeper leader. Restart the broker\brokers that are not replicating some of the ...

Troubleshoot Cluster Setup | CockroachDB Docs

Non-release builds of CockroachDB may not be able to run on older hardware ... When a CockroachDB node dies (or is partitioned) the...

The Expert's Guide to Running Apache Kafka on Kubernetes

Storing messages and records. Records published to Kafka can be stored in partitioned append- only logs distributed across the cluster for fault-tolerance.

Run a Replicated Stateful Application - Kubernetes

If you do not already have a cluster, you can create one by using minikube ... server even if it gets a new...

SSpike in Under Replicated and Offline Partition while ...

Deleting just a zookeeper follower replica does not lead to the urp spike • The zookeeper leader comes back up correctly after a...