[Bug] Broker rolling upgrade loop when default value is set
See original GitHub issueDescribe the bug
When initialDelaySeconds
is set to 0
(instead of omitted), it would cause broker to stuck in rolling upgrade spin loop.
# Source: kafka/templates/kafkacluster.yml
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: kafka-cluster
spec:
kafka:
replicas: 5
readinessProbe:
initialDelaySeconds: 0
To Reproduce Steps to reproduce the behavior:
- Set broker
livenessProbe.initialDelaySeconds
to 0; - broker will keep upgrading indefinitely, with STS generation keep being updated;
Operator logs for spin loop
2020-12-03 05:38:51 DEBUG KafkaAssemblyOperator:3264 - Reconciliation #1(watch) Kafka(kafka/kafka-cluster): Rolling pod kafka-cluster-kafka-0 due to [Pod has old generation]
2020-12-03 05:38:51 DEBUG KafkaRoller:687 - Reconciliation #1(watch) Kafka(kafka/kafka-cluster): Creating AdminClient for kafka-cluster-kafka-0.kafka-cluster-kafka-brokers.kafka.svc.cluster.local:9091,kafka-cluster-kafka-1.kafka-cluster-kafka-brokers.kafka.svc.cluster.local:9091,kafka-cluster-kafka-2.kafka-cluster-kafka-brokers.kafka.svc.cluster.local:9091,kafka-cluster-kafka-3.kafka-cluster-kafka-brokers.kafka.svc.cluster.local:9091,kafka-cluster-kafka-4.kafka-cluster-kafka-brokers.kafka.svc.cluster.local:9091
2020-12-03 05:38:53 INFO KafkaRoller:500 - Reconciliation #1(watch) Kafka(kafka/kafka-cluster): Pod 0 needs to be restarted. Reason: [Pod has old generation]
Expected behavior
- no spin loop
Environment (please complete the following information):
- Strimzi version: 0.20.0
- Installation method:
helm charts
- Kubernetes cluster: eks 1.16
- Infrastructure: aws eks
It’s believed when a default value is set, k8s api will return null
instead of actual default, where Strimzi detects it as diff and triggers rolling upgrade.
This might be able to generalized into when default value is set on any field where k8s return null or omit from the api response.
Operator Log
2020-12-03 05:38:51 DEBUG KafkaSetOperator:102 - StatefulSet kafka/kafka-cluster-kafka already exists, patching it
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/revisionHistoryLimit"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/metadata/annotations/strimzi.io~1generation"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:102 - StatefulSet kafka/kafka-cluster-kafka differs: {"op":"add","path":"/spec/template/spec/containers/0/livenessProbe/initialDelaySeconds","value":0}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:103 - Current StatefulSet path /spec/template/spec/containers/0/livenessProbe/initialDelaySeconds has value
2020-12-03 05:38:51 DEBUG StatefulSetDiff:104 - Desired StatefulSet path /spec/template/spec/containers/0/livenessProbe/initialDelaySeconds has value 0
2020-12-03 05:38:51 DEBUG StatefulSetDiff:102 - StatefulSet kafka/kafka-cluster-kafka differs: {"op":"add","path":"/spec/template/spec/containers/0/readinessProbe/initialDelaySeconds","value":0}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:103 - Current StatefulSet path /spec/template/spec/containers/0/readinessProbe/initialDelaySeconds has value
2020-12-03 05:38:51 DEBUG StatefulSetDiff:104 - Desired StatefulSet path /spec/template/spec/containers/0/readinessProbe/initialDelaySeconds has value 0
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/terminationMessagePath"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/terminationMessagePolicy"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/dnsPolicy"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/initContainers/0/env/0/valueFrom/fieldRef/apiVersion"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/initContainers/0/terminationMessagePath"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/initContainers/0/terminationMessagePolicy"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/restartPolicy"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/serviceAccount"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/volumes/4/configMap/defaultMode"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/volumeClaimTemplates/0/spec/volumeMode"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/spec/volumeClaimTemplates/0/status"}
2020-12-03 05:38:51 DEBUG StatefulSetDiff:86 - StatefulSet kafka/kafka-cluster-kafka ignoring diff {"op":"remove","path":"/status"}
2020-12-03 05:38:51 DEBUG KafkaSetOperator:54 - Changed template spec => needs rolling update
2020-12-03 05:38:51 DEBUG StatefulSetOperator:305 - Patching StatefulSet kafka/kafka-cluster-kafka
2020-12-03 05:38:51 DEBUG KafkaSetOperator:168 - StatefulSet kafka-cluster-kafka in namespace kafka has been patched
2020-12-03 05:38:51 DEBUG KafkaAssemblyOperator:927 - Kafka.spec.kafka.version unchanged
2020-12-03 05:38:51 DEBUG KafkaRoller:203 - Reconciliation #1(watch) Kafka(kafka/kafka-cluster): Initial order for rolling restart [0, 1, 2, 3, 4]
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Streams application rolling upgrade - Google Groups
numStandbyReplicas is set to 1. Initial startup of an instance takes 15 seconds of getting some data from external sources. Then it calls...
Read more >Upgrade | Confluent Platform 3.3.0
Follow the below steps for a rolling upgrade: Update server.properties on all Kafka brokers by modifying the properties inter.broker.protocol.
Read more >Kafka 3.3 Documentation
For a rolling upgrade: Update server.properties on all brokers and add the following properties. CURRENT_KAFKA_VERSION refers to the version you are upgrading ...
Read more >OpenShift Container Platform 4.10 release notes
New default component types for AWS installations ... provides links to debug terminals for each crash looping container within that pod.
Read more >Oracle Database 12c Release 1 (12.1.0.1) New Features
Error handling and user-defined exception processing has been improved to allow ... Default values for columns can directly refer to Oracle sequences.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I opened a PR for this.
Kubernetes is not consistent in how it handles some of these situations. Different fields use different default values etc. So this is something what needs to be taken field by field really.
Sorry for mixing up the default values. I feel like this bug could be more general than just
initialDelaySeconds
, where any field with default value could be affected. In terms of fixes, it would be great if it can be fixed programmatically, but even just by documenting it it would be a good start.