question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pod could not be rotated due to under-replicated partition

See original GitHub issue

Describe the bug Unable to rotate pod on strimzi upgrade due to not expected under-replicated error.

To Reproduce

  1. Upgrade From 0.25 to 0.26.1
  2. Random error difficult to reproduce

Expected behavior Hello, When upgrading to version 0.26.1, we had several cases of pod rollout blocked due to under-replication error when it doesn’t seem to be expected according to the configurations in place. The problem occurred with random topics but also with the consumer-offsets topic.

Environment (please complete the following information):

  • Strimzi version: 0.26.1
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.20
  • Infrastructure: Amazon EKS

YAML files and logs

Topics Cluster conf:

    config:
      auto.create.topics.enable: 'false'
      num.partitions: 12
      default.replication.factor: 3
      min.insync.replicas: 1
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 1

Topics conf for one the topics where we had this issues

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: app-to-websocket
  namespace: kafka-applications
  labels:
    strimzi.io/cluster: kafka
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: "3600000" # 1H
    segment.ms: "300000"    # 5mn

and for __consumer_offsets (default one)

  partitions: 50
  replicas: 3
  2021-12-16 16:05:21 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$ForceableProblem: Pod kafka-metapro-kafka-0 is currently the controller and there are other pods still to roll, retrying after at least 250ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #69(timer) Kafka(kafka-customers-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #68(timer) Kafka(kafka-applications/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 500ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #71(timer) Kafka(kafka-systems-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 1000ms
2021-12-16 16:05:24 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:24 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 2000ms
2021-12-16 16:05:26 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:26 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 4000ms
2021-12-16 16:05:30 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:30 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 8000ms
2021-12-16 16:05:38 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:38 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 16000ms
2021-12-16 16:05:54 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be 2021-12-16 16:05:21 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$ForceableProblem: Pod kafka-metapro-kafka-0 is currently the controller and there are other pods still to roll, retrying after at least 250ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #69(timer) Kafka(kafka-customers-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #68(timer) Kafka(kafka-applications/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 500ms
2021-12-16 16:05:22 INFO  AbstractOperator:466 - Reconciliation #71(timer) Kafka(kafka-systems-logs/kafka): reconciled
2021-12-16 16:05:22 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:22 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 1000ms
2021-12-16 16:05:24 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:24 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 2000ms
2021-12-16 16:05:26 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:26 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 4000ms
2021-12-16 16:05:30 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:30 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 8000ms
2021-12-16 16:05:38 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:38 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 16000ms
2021-12-16 16:05:54 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:54 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 32000ms
2021-12-16 16:06:16 INFO  AbstractOperator:363 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Reconciliation is in progress
2021-12-16 16:06:27 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:06:27 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 64000ms (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:05:54 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 32000ms
2021-12-16 16:06:16 INFO  AbstractOperator:363 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Reconciliation is in progress
2021-12-16 16:06:27 INFO  KafkaAvailability:135 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): __consumer_offsets/27 will be under-replicated (ISR={0}, replicas=[0,4,5], min.insync.replicas=1) if broker 0 is restarted.
2021-12-16 16:06:27 INFO  KafkaRoller:299 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable, retrying after at least 64000ms


2021-12-16 16:07:31 INFO  KafkaRoller:292 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): Could not roll pod 0, giving up after 10 attempts. Total delay between attempts 127750ms
io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:370) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$6(KafkaRoller.java:277) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2021-12-16 16:07:31 ERROR AbstractOperator:240 - Reconciliation #70(timer) Kafka(kafka-metapro/kafka-metapro): createOrUpdate failed
io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod kafka-metapro-kafka-0 is currently not rollable
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:370) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$6(KafkaRoller.java:277) ~[io.strimzi.cluster-operator-0.26.1.jar:0.26.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Additional context Migration from 0.25 to 0.26.1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
otassetti-talendcommented, Jan 4, 2022

Hello, I’m putting a quick feedback to close the issue. We identified that the internal topics __strimzi-topic-operator-kstreams-topic-store-changelog replicas was set to 1, i think this settings come from a previous version (0.21->0.22 ?); so we set replicas to 3 and did repartitioning.

I don’t know if it was the root cause, but we didn’t identified any other sync issue on the migration we did after this change.

Thanks

1reaction
scholzjcommented, Dec 16, 2021

Well, you are right that I would not necessarily expect the replicas to not be in sync. All I meant was that the operator algorithm seems to work as intended here.

But I do now really know the cluster, so it is hard for me to speculate about the reasons. It could have been out of sync already before. Or it could be related to a recent restart - but from my experience, that usually syncs-up fairly quickly for the consumer offset topics. It could be also just a slow networking and so on. So it is hard to say about what is causing it.

Kafka logs might say more. I’m not really an expert on Kafka itself, so do not have any pointers to what exactly to look for. But maybe if you share the logs, others might have some idea.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fixing under replicated partitions in kafka - Stack Overflow
We faced the same issue: Solution was: Restart the Zookeeper leader. Restart the broker\brokers that are not replicating some of the ...
Read more >
Troubleshoot Cluster Setup | CockroachDB Docs
Non-release builds of CockroachDB may not be able to run on older hardware ... When a CockroachDB node dies (or is partitioned) the...
Read more >
The Expert's Guide to Running Apache Kafka on Kubernetes
Storing messages and records. Records published to Kafka can be stored in partitioned append- only logs distributed across the cluster for fault-tolerance.
Read more >
Run a Replicated Stateful Application - Kubernetes
If you do not already have a cluster, you can create one by using minikube ... server even if it gets a new...
Read more >
SSpike in Under Replicated and Offline Partition while ...
Deleting just a zookeeper follower replica does not lead to the urp spike • The zookeeper leader comes back up correctly after a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found