question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cluster operator generated new certificats no reason

See original GitHub issue

We used strimzi 0.17.2 on openshift v3.11 with trident for the storage.

strimzi.io/kind=Kafka strimzi.io/cluster=cluster-kafka-persistent

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: cluster-kafka-persistent
spec:
  kafka:
    authorization:
      type: simple
    version: 2.4.0
    replicas: 5
    listeners:
      external:
        authentication:
          type: scram-sha-512
        type: route
    config:
      offsets.topic.replication.factor: 5
      transaction.state.log.replication.factor: 5
      transaction.state.log.min.isr: 3
      log.message.format.version: "2.4"
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
      class: backend-silver

  zookeeper:
    replicas: 5
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
      class: backend-silver

  clusterCa:
    generateCertificateAuthority: true
    validityDays: 1460

  clientsCa:
    generateCertificateAuthority: true
    validityDays: 1460

  entityOperator:
    topicOperator: {}
    userOperator: {}

We fixed renew certifcats to 1460 days. 3 days ago without reason cluster-operator re-generated this secrets:

  • cluster-kafka-persistent-clients-ca
  • cluster-kafka-persistent-clients-ca-cert
  • cluster-kafka-persistent-cluster-ca
  • cluster-kafka-persistent-cluster-ca-cert

We ended up in this state:

[manage_strimzi]# oc get all
NAME                                           READY     STATUS             RESTARTS   AGE
pod/cluster-kafka-persistent-zookeeper-0       1/2       CrashLoopBackOff   812        2d
pod/cluster-kafka-persistent-zookeeper-1       1/2       CrashLoopBackOff   802        2d
pod/cluster-kafka-persistent-zookeeper-2       1/2       CrashLoopBackOff   812        2d
pod/cluster-kafka-persistent-zookeeper-3       1/2       CrashLoopBackOff   813        2d
pod/cluster-kafka-persistent-zookeeper-4       1/2       CrashLoopBackOff   803        2d
pod/strimzi-cluster-operator-d5b6c6458-fpbqx   1/1       Running            146        19d

NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/cluster-kafka-persistent-zookeeper-client   ClusterIP   172.30.253.229   <none>        2181/TCP                     2d
service/cluster-kafka-persistent-zookeeper-nodes    ClusterIP   None             <none>        2181/TCP,2888/TCP,3888/TCP   2d

NAME                                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/strimzi-cluster-operator   1         1         1            1           225d

NAME                                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/strimzi-cluster-operator-d5b6c6458   1         1         1         167d

NAME                                                  DESIRED   CURRENT   AGE
statefulset.apps/cluster-kafka-persistent-zookeeper   5         5         2d
[manage_strimzi]#_

Then we try relaunched pod after stopped with this command

oc scale deployment.apps/strimzi-cluster-operator --replicas=0 -n kafka-uat
oc delete statefulset.apps/cluster-kafka-persistent-kafka statefulset.apps/cluster-kafka-persistent-zookeeper -n kafka-uat
oc scale deployment.apps/strimzi-cluster-operator --replicas=1 -n kafka-uat

We did not manage to start the pod The cluster-operator pod gave this message:

2020-09-24 09:40:40 ERROR AbstractOperator:124 - Reconciliation #1(timer) Kafka(kafka-uat/cluster-kafka-persistent): createOrUpdate failed
java.lang.NullPointerException: null
at io.strimzi.operator.cluster.model.ModelUtils.buildSecret(ModelUtils.java:248) ~[io.strimzi.cluster-operator-0.17.0.jar:0.17.0]
at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.clusterOperatorSecret(KafkaAssemblyOperator.java:3187) ~[io.strimzi.cluster-operator-0.17.0.jar:0.17.0]
at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator.lambda$reconcile$3(KafkaAssemblyOperator.java:254) ~[io.strimzi.cluster-operator-0.17.0.jar:0.17.0]
at io.vertx.core.Future.lambda$compose$3(Future.java:360) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:107) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:152) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:113) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.handle(FutureImpl.java:178) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.handle(FutureImpl.java:21) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:107) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:152) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:113) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.handle(FutureImpl.java:178) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.FutureImpl.handle(FutureImpl.java:21) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:330) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369) ~[io.vertx.vertx-core-3.8.5.jar:3.8.5]

I had no other solution than to delete everything and reinstall. do you have any idea why this behavior? And I note cluster-operator restarted a lot of time perhaps there are a memory leak on this pod ?

PS: I will increase memories for JVM for cluster-operator pod.

Best regards, Toty

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Sep 28, 2020

Looks like for whatever reason the secret with the certificate got deleted / damaged. Do you have a fill log from the cluster operator for when it happened? (ideally on DEBUG level, but even without it it might be helpful) The NullPointerException is probably just a followup rather than the actual cause.

0reactions
totee19commented, Sep 30, 2020

I know it’s difficult when we don’t have a log file. I hope it not happen again. Thank for your support.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting Operator issues - OpenShift Documentation
Default OpenShift Container Platform cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription ...
Read more >
Cluster network operator pod 's internal webhook exposes an ...
Cluster network operator pod 's internal webhook exposes an API which certificate could eventually expire. - Red Hat Customer Portal.
Read more >
Issues with Certificate manager (cert-manager) while upgrading
Resolving the problem. Ensure you delete the resources created by the previous Certificate manager (cert-manager) to allow the operator to create new resources....
Read more >
Update security certificates with a different CA | Elasticsearch ...
On any node in your cluster, generate a new CA certificate. You only need to complete this step one time. If you're using...
Read more >
Certificate management | CockroachDB Docs
How to authenticate a secure 3-node CockroachDB cluster with Kubernetes. ... By default, the Operator will generate and sign 1 client and 1...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found