question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Operator continuously update resources in AKS

See original GitHub issue

Describe the bug In our case we noticed that when we trying to create internal LB in AKS managed by Rancher (this is important) using annotation:

externalBootstrapService:
  metadata:
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
perPodService:
  metadata:
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"

we see that time needed for creating LB is ~15 min. The root cause is that Strimzi operator and Rancher (which adds custom annotations like /metadata/annotations/field.cattle.io~1publicEndpoints) simultaneously try to change the same services of LoadBalancer type. Rancher adds some labels and annotations to services and Strimzi operator removes that data. That process caused continuously updating Azure load balancer configuration. As a result, Azure load balancer stays in an “updating” state with a long processing queue.

To Reproduce Steps to reproduce the behavior:

  1. Create internal LB for external listener
  2. Create AKS cluster managed by Rancher (https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/hosted-kubernetes-clusters/aks/)
  3. Create Internal LB by adding annotations: creating takes more than 15 min

Expected behavior LB creation takes less than 2 min.

Environment (please complete the following information):

  • Strimzi version: 0.18
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.18
  • Infrastructure: Azure AKS/Rancher

YAML files and logs

image

Additional context Suggestion: add ability ignore changes for annotations/labels for resources watched by operator:

example:
ignore_changes = [
      metadata[0].annotations["cattle.io/*"],
      metadata[0].annotations["field.cattle.io~1publicEndpoints"]
    ]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Dec 7, 2020

@mluiten Strimzi does not remove the annotation per se. It just reconciles the resource => i.e. applies our version of it which removes these annotations because it does not have them. We do not have any whitelists or blacklists because so far it was never needed until this case. But for this particular usecase I would assume one can add it here: https://github.com/strimzi/strimzi-kafka-operator/blob/3e76f37c696bbb2712477eb541edb431bc8164fe/operator-common/src/main/java/io/strimzi/operator/common/operator/resource/ServiceOperator.java#L54

There we already handle some similar things in the service spec section such as assigned node ports or ipFamily etc. So I assume here we can have some allow-list for annotations which would be back ported from the original service to not get them deleted in the patch. That is at least where I planned to start … but of course contributions are always welcomed, so if you wanna look into it you are more then welcomed.

0reactions
mluitencommented, Dec 7, 2020

@scholzj if you can point me in the right direction, I would be eager to take a look if I can try and make an improvement.

Why does Strimzi want to remove annotations that have nothing to do with Strimzi? Is there a whitelist of annotations it should care about?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Patch and upgrade AKS worker nodes - Azure - Microsoft Learn
Learn about day-2 patching and upgrading practices for Azure Kubernetes Service (AKS) worker nodes and Kubernetes (K8S) versions.
Read more >
AKS Checklist
The AKS Checklist is a (tentatively) exhaustive list of all elements you need to think of when preparing a cluster for production.
Read more >
Upgrade MongoDB and the Operator
Bug fixes and improvements are not backported to older minor versions. ... Update the Custom Resource Definition file for the Operator, ...
Read more >
Automate image updates to Git - Flux CD
For production environments, this feature allows you to automatically deploy application patches (CVEs and bug fixes), and keep a record of ...
Read more >
Zero-downtime Deployment in Kubernetes with Jenkins
Ever since we added the Kubernetes Continuous Deploy and Azure ... You can deploy the resource to Azure Kubernetes Service (AKS) or the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found