question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] failed to reconcile zookeeper-nodes service in the dual-stack kubernetes cluster

See original GitHub issue

Describe the bug

The strimzi-kafka-operator failed to internal patch for zookeeper-nodes service. The service spec.ipFamily field is immutable in the dual-stack kubernetes cluster. https://kubernetes.io/docs/concepts/services-networking/dual-stack/

To Reproduce

Steps to reproduce the behavior:

  1. create dual-stack kubernetes cluster
  2. install strimizi-kafka-operator (v0.19.0)
  3. create kafka cluster (succeed to create it, but reconciliation failed)

Environment:

  • Strimzi version: [0.19.0]
  • Installation method: [Helm]
  • Kubernetes cluster: [Kubernetes 1.18.4 (dual-stack)]

YAML files and logs

cluster.yaml:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: euler-kafka
  namespace: euler
spec:
  cruiseControl:
    brokerCapacity:
      cpuUtilization: 80
      disk: 10Gi
      inboundNetwork: 50MiB/s
      outboundNetwork: 50MiB/s
    config:
      default.replica.movement.strategies: com.linkedin.kafka.cruisecontrol.executor.strategy.PostponeUrpReplicaMovementStrategy
    resources:
      limits:
        memory: 1Gi
      request:
        cpu: 500m
        memory: 512Mi
    template:
      pod:
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000
  entityOperator:
    template:
      pod:
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000
    topicOperator:
      resources:
        limits:
          memory: 1Gi
        requests:
          cpu: 1000m
          memory: 512Mi
    userOperator:
      livenessProbe:
        initialDelaySeconds: 60
      readinessProbe:
        initialDelaySeconds: 60
      resources:
        limits:
          cpu: 100m
          memory: 256Mi
  kafka:
    config:
      auto.create.topics.enable: "false"
      log.message.format.version: "2.5"
      num.recovery.threads.per.data.dir: 2
      offsets.topic.replication.factor: 3
      socket.receive.buffer.bytes: -1
      socket.send.buffer.bytes: -1
      transaction.state.log.min.isr: 2
      transaction.state.log.replication.factor: 3
    jvmOptions:
      -Xms: 1024m
      -Xmx: 1024m
    listeners:
      plain: {}
      tls: {}
    metrics:
      lowercaseOutputName: true
      rules:
      - labels:
          clientId: $3
          partition: $5
          topic: $4
        name: kafka_server_$1_$2
        pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
        type: GAUGE
      - labels:
          broker: $4:$5
          clientId: $3
        name: kafka_server_$1_$2
        pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
        type: GAUGE
      - labels:
          cipher: $5
          listener: $2
          networkProcessor: $3
          protocol: $4
        name: kafka_server_$1_connections_tls_info
        pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
        type: GAUGE
      - labels:
          clientSoftwareName: $2
          clientSoftwareVersion: $3
          listener: $4
          networkProcessor: $5
        name: kafka_server_$1_connections_software
        pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
        type: GAUGE
      - labels:
          listener: $2
          networkProcessor: $3
        name: kafka_server_$1_$4
        pattern: 'kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):'
        type: GAUGE
      - labels:
          listener: $2
          networkProcessor: $3
        name: kafka_server_$1_$4
        pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
        type: GAUGE
      - name: kafka_$1_$2_$3_percent
        pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
        type: GAUGE
      - name: kafka_$1_$2_$3_percent
        pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
        type: GAUGE
      - labels:
          $4: $5
        name: kafka_$1_$2_$3_percent
        pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
        type: GAUGE
      - labels:
          $4: $5
          $6: $7
        name: kafka_$1_$2_$3_total
        pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
        type: COUNTER
      - labels:
          $4: $5
        name: kafka_$1_$2_$3_total
        pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
        type: COUNTER
      - name: kafka_$1_$2_$3_total
        pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
        type: COUNTER
      - labels:
          $4: $5
          $6: $7
        name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
        type: GAUGE
      - labels:
          $4: $5
        name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
        type: GAUGE
      - name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
        type: GAUGE
      - labels:
          $4: $5
          $6: $7
        name: kafka_$1_$2_$3_count
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
        type: COUNTER
      - labels:
          $4: $5
          $6: $7
          quantile: 0.$8
        name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
        type: GAUGE
      - labels:
          $4: $5
        name: kafka_$1_$2_$3_count
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
        type: COUNTER
      - labels:
          $4: $5
          quantile: 0.$6
        name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
        type: GAUGE
      - name: kafka_$1_$2_$3_count
        pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
        type: COUNTER
      - labels:
          quantile: 0.$4
        name: kafka_$1_$2_$3
        pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
        type: GAUGE
    rack:
      topologyKey: kubernetes.io/hostname
    replicas: 3
    resources:
      limits:
        memory: 6Gi
      requests:
        cpu: 2000m
        memory: 6Gi
    storage:
      type: jbod
      volumes:
      - class: drbd
        deleteClaim: false
        id: 0
        size: 10Gi
        type: persistent-claim
    template:
      pod:
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000
          sysctls:
          - name: net.ipv4.tcp_syncookies
            value: "0"
    version: 2.5.0
  kafkaExporter:
    groupRegex: .*
    resources:
      limits:
        cpu: 500m
        memory: 256Mi
    template:
      pod:
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000
    topicRegex: .*
  zookeeper:
    jvmOptions:
      -Xms: 1024m
      -Xmx: 1024m
    metrics:
      lowercaseOutputName: true
      rules:
      - name: zookeeper_$2
        pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+)><>(\w+)
        type: GAUGE
      - labels:
          replicaId: $2
        name: zookeeper_$3
        pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+)><>(\w+)
        type: GAUGE
      - labels:
          memberType: $3
          replicaId: $2
        name: zookeeper_$4
        pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(Packets\w+)
        type: COUNTER
      - labels:
          memberType: $3
          replicaId: $2
        name: zookeeper_$4
        pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(\w+)
        type: GAUGE
      - labels:
          memberType: $3
          replicaId: $2
        name: zookeeper_$4_$5
        pattern: org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+), name3=(\w+)><>(\w+)
        type: GAUGE
    replicas: 3
    resources:
      limits:
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 4Gi
    storage:
      class: drbd
      deleteClaim: false
      size: 5Gi
      type: persistent-claim
    template:
      pod:
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000

operator logs:

2020-10-05 03:23:26 ERROR AbstractOperator:175 - Reconciliation #2054(timer) Kafka(euler/euler-kafka): createOrUpdate failed
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.96.0.1/api/v1/namespaces/euler/services/euler-kafka-zookeeper-nodes. Message: Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.ipFamily, message=Invalid value: "null": field is immutable, reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.ipFamily, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=null, kind=Service, name=euler-kafka-zookeeper-nodes, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handlePatch(OperationSupport.java:300) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handlePatch(BaseOperation.java:829) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:152) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:26) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
        at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:5) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
        at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.patch(HasMetadataOperation.java:158) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:80) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:40) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:167) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:162) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:63) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:20) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.lambda$reconcile$0(AbstractResourceOperator.java:103) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:313) ~[io.vertx.vertx-core-3.9.1.jar:3.9.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.50.Final.jar:4.1.50.Final]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2020-10-05 03:23:26 WARN  AbstractOperator:330 - Reconciliation #2054(timer) Kafka(euler/euler-kafka): Failed to reconcile
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.96.0.1/api/v1/namespaces/euler/services/euler-kafka-zookeeper-nodes. Message: Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.ipFamily, message=Invalid value: "null": field is immutable, reason=FieldValueInvalid, additionalProperties={}), StatusCause(field=spec.ipFamily, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=null, kind=Service, name=euler-kafka-zookeeper-nodes, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Service "euler-kafka-zookeeper-nodes" is invalid: [spec.ipFamily: Invalid value: "null": field is immutable, spec.ipFamily: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handlePatch(OperationSupport.java:300) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handlePatch(BaseOperation.java:829) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:152) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:26) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
        at io.fabric8.kubernetes.api.model.DoneableService.done(DoneableService.java:5) ~[io.fabric8.kubernetes-model-4.6.4.jar:4.6.4]
        at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.patch(HasMetadataOperation.java:158) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:80) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.fabric8.kubernetes.client.dsl.internal.ServiceOperationsImpl.patch(ServiceOperationsImpl.java:40) ~[io.fabric8.kubernetes-client-4.6.4.jar:?]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:167) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.internalPatch(AbstractResourceOperator.java:162) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:63) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.ServiceOperator.internalPatch(ServiceOperator.java:20) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.strimzi.operator.common.operator.resource.AbstractResourceOperator.lambda$reconcile$0(AbstractResourceOperator.java:103) ~[io.strimzi.operator-common-0.19.0.jar:0.19.0]
        at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:313) ~[io.vertx.vertx-core-3.9.1.jar:3.9.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.50.Final.jar:4.1.50.Final]
        at java.lang.Thread.run(Thread.java:834) [?:?]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Oct 8, 2020

Last few issues left open: https://github.com/strimzi/strimzi-kafka-operator/milestone/19 … unless new issues are found I hope to do the first RC over the weekend.

1reaction
scholzjcommented, Oct 7, 2020

Ok, thanks for the testing. I opened #3757 to get the fix from the image you tested into the 0.20.0 release. Thanks for your help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

this cluster is not configured for dual-stack services #14204
What happened after the commands executed? The cluster is created and validated, nginx service fails. The Service "svc-nginx" is invalid: spec.
Read more >
1997364 – IPv6 Static IP Day2 nodes failing to join cluster
Description of problem: After creating nmstateconfig and BMH resources for two worker nodes after day1 cluster is installed, both nodes get stuck at ......
Read more >
Configuring Strimzi (In Development)
Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations.
Read more >
Untitled
Once the limit is reached, writes to the ingester will fail (5xx) for new ... If we are planning to delete a node...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found