[Bug] failed to reconcile zookeeper-nodes service in the dual-stack kubernetes cluster
See original GitHub issueDescribe the bug
The strimzi-kafka-operator failed to internal patch for zookeeper-nodes service. The service spec.ipFamily
field is immutable in the dual-stack kubernetes cluster.
https://kubernetes.io/docs/concepts/services-networking/dual-stack/
To Reproduce
Steps to reproduce the behavior:
- create dual-stack kubernetes cluster
- install strimizi-kafka-operator (v0.19.0)
- create kafka cluster (succeed to create it, but reconciliation failed)
Environment:
- Strimzi version: [0.19.0]
- Installation method: [Helm]
- Kubernetes cluster: [Kubernetes 1.18.4 (dual-stack)]
YAML files and logs
cluster.yaml:
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: euler-kafka
namespace: euler
spec:
cruiseControl:
brokerCapacity:
cpuUtilization: 80
disk: 10Gi
inboundNetwork: 50MiB/s
outboundNetwork: 50MiB/s
config:
default.replica.movement.strategies: com.linkedin.kafka.cruisecontrol.executor.strategy.PostponeUrpReplicaMovementStrategy
resources:
limits:
memory: 1Gi
request:
cpu: 500m
memory: 512Mi
template:
pod:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
entityOperator:
template:
pod:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
topicOperator:
resources:
limits:
memory: 1Gi
requests:
cpu: 1000m
memory: 512Mi
userOperator:
livenessProbe:
initialDelaySeconds: 60
readinessProbe:
initialDelaySeconds: 60
resources:
limits:
cpu: 100m
memory: 256Mi
kafka:
config:
auto.create.topics.enable: "false"
log.message.format.version: "2.5"
num.recovery.threads.per.data.dir: 2
offsets.topic.replication.factor: 3
socket.receive.buffer.bytes: -1
socket.send.buffer.bytes: -1
transaction.state.log.min.isr: 2
transaction.state.log.replication.factor: 3
jvmOptions:
-Xms: 1024m
-Xmx: 1024m
listeners:
plain: {}
tls: {}
metrics:
lowercaseOutputName: true
rules:
- labels:
clientId: $3
partition: $5
topic: $4
name: kafka_server_$1_$2
pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
type: GAUGE
- labels:
broker: $4:$5
clientId: $3
name: kafka_server_$1_$2
pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
type: GAUGE
- labels:
cipher: $5
listener: $2
networkProcessor: $3
protocol: $4
name: kafka_server_$1_connections_tls_info
pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
type: GAUGE
- labels:
clientSoftwareName: $2
clientSoftwareVersion: $3
listener: $4
networkProcessor: $5
name: kafka_server_$1_connections_software
pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
type: GAUGE
- labels:
listener: $2
networkProcessor: $3
name: kafka_server_$1_$4
pattern: 'kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):'
type: GAUGE
- labels:
listener: $2
networkProcessor: $3
name: kafka_server_$1_$4
pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
type: GAUGE
- name: kafka_$1_$2_$3_percent
pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
type: GAUGE
- name: kafka_$1_$2_$3_percent
pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
type: GAUGE
- labels:
$4: $5
name: kafka_$1_$2_$3_percent
pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
type: GAUGE
- labels:
$4: $5
$6: $7
name: kafka_$1_$2_$3_total
pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
type: COUNTER
- labels:
$4: $5
name: kafka_$1_$2_$3_total
pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
type: COUNTER
- name: kafka_$1_$2_$3_total
pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
type: COUNTER
- labels:
$4: $5
$6: $7
name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
type: GAUGE
- labels:
$4: $5
name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
type: GAUGE
- name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
type: GAUGE
- labels:
$4: $5
$6: $7
name: kafka_$1_$2_$3_count
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
type: COUNTER
- labels:
$4: $5
$6: $7
quantile: 0.$8
name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
type: GAUGE
- labels:
$4: $5
name: kafka_$1_$2_$3_count
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
type: COUNTER
- labels:
$4: $5
quantile: 0.$6
name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
type: GAUGE
- name: kafka_$1_$2_$3_count
pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
type: COUNTER
- labels:
quantile: 0.$4
name: kafka_$1_$2_$3
pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
type: GAUGE
rack:
topologyKey: kubernetes.io/hostname
replicas: 3
resources:
limits:
memory: 6Gi
requests:
cpu: 2000m
memory: 6Gi
storage:
type: jbod
volumes:
- class: drbd
deleteClaim: false
id: 0
size: 10Gi
type: persistent-claim
template:
pod:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
sysctls:
- name: net.ipv4.tcp_syncookies
value: "0"
version: 2.5.0
kafkaExporter:
groupRegex: .*
resources:
limits:
cpu: 500m
memory: 256Mi
template:
pod:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
topicRegex: .*
zookeeper:
jvmOptions:
-Xms: 1024m
-Xmx: 1024m
metrics:
lowercaseOutputName: true
rules:
- name: zookeeper_$2
pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+)><>(\w+)
type: GAUGE
- labels:
replicaId: $2
name: zookeeper_$3
pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+)><>(\w+)
type: GAUGE
- labels:
memberType: $3
replicaId: $2
name: zookeeper_$4
pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(Packets\w+)
type: COUNTER
- labels:
memberType: $3
replicaId: $2
name: zookeeper_$4
pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(\w+)
type: GAUGE
- labels:
memberType: $3
replicaId: $2
name: zookeeper_$4_$5
pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+), name3=(\w+)><>(\w+)
type: GAUGE
replicas: 3
resources:
limits:
memory: 4Gi
requests:
cpu: 1000m
memory: 4Gi
storage:
class: drbd
deleteClaim: false
size: 5Gi
type: persistent-claim
template:
pod:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
operator logs:
2020-10-05 03:23:26 ERROR AbstractOperator:175 - Reconciliation #2054(timer) Kafka(euler/euler-kafka): createOrUpdate failed
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.96.0.1/api/v1/namespaces/euler/services/euler-kafka-zookeeper-nodes. Message: Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.ipFamily, message=Invalid value: "null": field is immutable, reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.ipFamily, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=null, kind=Service, name=euler-kafka-zookeeper-nodes, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handlePatch(OperationSupport.java:300) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handlePatch(BaseOperation.java:829) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:152) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:26) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:5) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.patch(HasMetadataOperation.java:158) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:80) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:40) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:167) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:162) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:63) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:20) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.lambda$reconcile$0(AbstractResourceOperator.java:103) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:313) ~[io.vertx.vertx-core-3.9.1.jar:3.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.50.Final.jar:4.1.50.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
2020-10-05 03:23:26 WARN AbstractOperator:330 - Reconciliation #2054(timer) Kafka(euler/euler-kafka): Failed to reconcile
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.96.0.1/api/v1/namespaces/euler/services/euler-kafka-zookeeper-nodes. Message: Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.ipFamily, message=Invalid value: "null": field is immutable, reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.ipFamily, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=null, kind=Service, name=euler-kafka-zookeeper-nodes, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handlePatch(OperationSupport.java:300) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handlePatch(BaseOperation.java:829) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:152) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:26) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:5) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.patch(HasMetadataOperation.java:158) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:80) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:40) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:167) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:162) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:63) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:20) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.lambda$reconcile$0(AbstractResourceOperator.java:103) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:313) ~[io.vertx.vertx-core-3.9.1.jar:3.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.50.Final.jar:4.1.50.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
this cluster is not configured for dual-stack services #14204
What happened after the commands executed? The cluster is created and validated, nginx service fails. The Service "svc-nginx" is invalid: spec.
Read more >1997364 – IPv6 Static IP Day2 nodes failing to join cluster
Description of problem: After creating nmstateconfig and BMH resources for two worker nodes after day1 cluster is installed, both nodes get stuck at ......
Read more >Configuring Strimzi (In Development)
Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations.
Read more >Untitled
Once the limit is reached, writes to the ingester will fail (5xx) for new ... If we are planning to delete a node...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Last few issues left open: https://github.com/strimzi/strimzi-kafka-operator/milestone/19 … unless new issues are found I hope to do the first RC over the weekend.
Ok, thanks for the testing. I opened #3757 to get the fix from the image you tested into the 0.20.0 release. Thanks for your help.