Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

VPN reset causes all pods to become not ready -> Unknown -> Ready

See original GitHub issue

We are running MicroK8s but our applications need a VPN to securely connect with each other. For this in some situations we have an Ubuntu VM with an OpenVPN connection active. On this VM Microk8s is installed.

However, if something changes in this VPN (like a reconnect) the pods in our cluster first become not ready and after a few minutes their status changes into Unknown. After approximately 20-30 minutes the issue slowly resolves itself and all pods become Ready again (and the application accessible).

Is this a known and/or expected issue?

I tried to run a microk8s inspect during this issue, but it takes very long to run. The result is attached. inspection-report-20210316_113756.tar.zip

Thanks for your help

Issue Analytics

State:
Created 3 years ago
Comments:12 (4 by maintainers)

Top GitHub Comments

2reactions

neoaggeloscommented, Nov 23, 2022

Hi @devZer0

I’m going to attempt to give some context around this issue, how it relates to MicroK8s and why it should not be an issue any longer in newer MicroK8s versions (1.22+). Happy to discuss this further if this is still an issue for you or anyone else.

First of all, at the time when this issue was created, MicroK8s was mostly meant as a developer tool, meaning that the MicroK8s team integrated as many ease-of-use hooks to make sure it was friction-free for basic usage. One of them was ensuring that the kube-apiserver certs included all the IP addresses of the host machine. Developer machines do not have a static IP address, and they may even move between environments (e.g. home <–> office). Therefore, MicroK8s included a quick check (hostname -I) to check for changes and automatically refresh the certificates as needed.

One side-effect of this was that the API server had to be restarted for the new certificates to take effect. In MicroK8s 1.21 and earlier, this had a domino effect where it would take down kubelet, the container runtime, and this would kill all cluster workloads. This is the behavior that was explained in the original issue shows as Pods becoming Ready -> Unknown -> …

This has not been the case for quite some time now. Starting from MicroK8s 1.22 onwards, kube-apiserver restarts do not kill the cluster workloads, so the issue described above would not occur at all. Further, there have been two additions to help mitigate this for deployments where it may be problematic:

This behavior is disabled by default in MicroK8s clusters. That is, for nodes that are part of a cluster, this mechanism is disabled, and certificate updates to add new IPs is to be performed manually by the cluster administrator.
This behavior can be disabled even for single-node deployments by creating a file called /var/snap/microk8s/current/var/lock/no-cert-reissue

To re-iterate, this should no longer restart workloads in MicroK8s 1.22 (released August 2021) or newer. Are you still affected by this issue? If so, please let’s keep this discussion going, we’re keen on seeing what we can improve on MicroK8s for this.

Also of note, there are quite a few duplicate GitHub issues for this specific problem, so it’s only logical that some are missed and not updated. Apologies for this, I can assure it is in the best interest of the team to ensure that we improve on this going forward.

1reaction

neoaggeloscommented, Nov 23, 2022

I agree with you in general. However, given the number of duplicates issues (for example, a recent one being https://github.com/canonical/microk8s/issues/3575), it would make sense to keep the issues that are “actionable”, or the ones where the original poster is coming back with more information and giving feedback so that the issue is resolved.

To be honest, I very much agree with your sentiment: The measure of success for a project is not the number of closed issues, but rather the engagement with the community and the resolution of ongoing problems users are having.