VPN reset causes all pods to become not ready -> Unknown -> Ready
See original GitHub issueWe are running MicroK8s but our applications need a VPN to securely connect with each other. For this in some situations we have an Ubuntu VM with an OpenVPN connection active. On this VM Microk8s is installed.
However, if something changes in this VPN (like a reconnect) the pods in our cluster first become not ready and after a few minutes their status changes into Unknown. After approximately 20-30 minutes the issue slowly resolves itself and all pods become Ready again (and the application accessible).
Is this a known and/or expected issue?
I tried to run a microk8s inspect
during this issue, but it takes very long to run. The result is attached.
inspection-report-20210316_113756.tar.zip
Thanks for your help
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (4 by maintainers)
Top Results From Across the Web
VPN reset causes all pods to become not ready -> Unknown
the pods in our cluster first become not ready and after a few minutes their status changes into Unknown.
Read more >How to Fix Kubernetes 'Node Not Ready' Error - Komodor
All stateful pods running on the node then become unavailable. Common reasons for a Kubernetes node not ready error include lack of resources...
Read more >Pods restart frequently causing periodic timeout errors - IBM
This issue can occur due to frequent failing readiness probes for a pod. When the pod becomes 'not ready', you might not be...
Read more >How to change status of nodes to ready in EKS - Bobcares
Let's see what our Support Techs have to say about changing the status of the node from Unknown or NotReady status to Ready...
Read more >Troubleshooting | Google Kubernetes Engine (GKE)
Get the pid of any container process (so NOT docker-containerd-shim ) for the Pod. From the above example: 1283107 - pause; 1283169 -...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @devZer0
I’m going to attempt to give some context around this issue, how it relates to MicroK8s and why it should not be an issue any longer in newer MicroK8s versions (1.22+). Happy to discuss this further if this is still an issue for you or anyone else.
First of all, at the time when this issue was created, MicroK8s was mostly meant as a developer tool, meaning that the MicroK8s team integrated as many ease-of-use hooks to make sure it was friction-free for basic usage. One of them was ensuring that the kube-apiserver certs included all the IP addresses of the host machine. Developer machines do not have a static IP address, and they may even move between environments (e.g. home <–> office). Therefore, MicroK8s included a quick check (
hostname -I
) to check for changes and automatically refresh the certificates as needed.One side-effect of this was that the API server had to be restarted for the new certificates to take effect. In MicroK8s 1.21 and earlier, this had a domino effect where it would take down kubelet, the container runtime, and this would kill all cluster workloads. This is the behavior that was explained in the original issue shows as Pods becoming Ready -> Unknown -> …
This has not been the case for quite some time now. Starting from MicroK8s 1.22 onwards, kube-apiserver restarts do not kill the cluster workloads, so the issue described above would not occur at all. Further, there have been two additions to help mitigate this for deployments where it may be problematic:
/var/snap/microk8s/current/var/lock/no-cert-reissue
To re-iterate, this should no longer restart workloads in MicroK8s 1.22 (released August 2021) or newer. Are you still affected by this issue? If so, please let’s keep this discussion going, we’re keen on seeing what we can improve on MicroK8s for this.
Also of note, there are quite a few duplicate GitHub issues for this specific problem, so it’s only logical that some are missed and not updated. Apologies for this, I can assure it is in the best interest of the team to ensure that we improve on this going forward.
I agree with you in general. However, given the number of duplicates issues (for example, a recent one being https://github.com/canonical/microk8s/issues/3575), it would make sense to keep the issues that are “actionable”, or the ones where the original poster is coming back with more information and giving feedback so that the issue is resolved.
To be honest, I very much agree with your sentiment: The measure of success for a project is not the number of closed issues, but rather the engagement with the community and the resolution of ongoing problems users are having.