question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NPE when incorrect topologySpreadConstraints are applied on a fresh cluster

See original GitHub issue

Describe the bug I was trying out topologySpreadConstraints on our testing environment. I added the following configuration for broker:

topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAyway
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
              - kafka.broker
       - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAyway
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
              - kafka.zookeeper

The second item in the constraints array was a mistake and was supposed to be applied to zookeeper. Nonetheless, this is what I observed:

  1. Update on existing kafka cluster:
  • No effect
  • But, haven’t tested scaling out brokers. Suspect it would fail.
  1. Creating new kafka cluster with this config:
  • Operator throws an NPE
  • I suspect it has to do with this line where it gets the pod infro using the fabric8 client, but the metadata is missing since it’s looking up the wrong label?

NPE stacktrace:

2022-06-27 06:44:46 ERROR AbstractOperator:247 - Reconciliation #18(timer) Kafka(kafka/kafka-cluster): createOrUpdate failed
java.lang.NullPointerException: null
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartAndAwaitReadiness(KafkaRoller.java:637) ~[io.strimzi.cluster-operator-0.27.1.jar:0.27.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:364) ~[io.strimzi.cluster-operator-0.27.1.jar:0.27.1]
	at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$6(KafkaRoller.java:277) ~[io.strimzi.cluster-operator-0.27.1.jar:0.27.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

This was no longer the issue once the zookeeper topology constraint was removed from broker config, but I think we should handle this error better instead of throwing an NPE.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Jun 30, 2022

Might be best to fix after #6663 is merged to avoid conflicts. The NPEs are not nice, but technically, this would anyway end in an error, just a different one. So I don’t think it matter that much if it is fixed now or in few days.

1reaction
scholzjcommented, Jun 30, 2022

Great, thanks. I will have a look.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pod Topology Spread Constraints - Kubernetes
You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, ...
Read more >
OpenShift Container Platform 4.6 release notes | OKD 4.6
The image pruner now tolerates invalid image references by default on new installations of OKD, which allows pruning to continue even if it...
Read more >
Tencent Kubernetes Engine Best Practices
Clusters with most new features are preferably managed. ... The node pool is mainly used to batch manage nodes with the following items:....
Read more >
4.6.17 - Release Status
cluster -node-tuned; cluster-svcat-apiserver-operator ... Bug 1874713: deployment: don't panic when applying deployment fails #332 · Bug 1870565: deployment: ...
Read more >
Controlling pod placement using pod topology spread ...
All pod topology spread constraints must be satisfied for a pod to be placed. Prerequisites. A cluster administrator has added the required labels...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found