Pods stuck in unknown state after reboot
See original GitHub issueI’m using latest/edge (with calico cni) and after rebooting the machine I’m getting all pods in Unknown state.
Logs of the calico node:
2020-08-27 07:46:50.152 [INFO][8] startup.go 290: Early log level set to info
2020-08-27 07:46:50.152 [INFO][8] startup.go 306: Using NODENAME environment for node name
2020-08-27 07:46:50.152 [INFO][8] startup.go 318: Determined node name: davigar15
2020-08-27 07:46:50.153 [INFO][8] startup.go 350: Checking datastore connection
2020-08-27 07:46:50.159 [INFO][8] startup.go 374: Datastore connection verified
2020-08-27 07:46:50.159 [INFO][8] startup.go 102: Datastore is ready
2020-08-27 07:46:50.170 [INFO][8] startup.go 652: Using autodetected IPv4 address on interface lxdbr0: 172.16.100.1/24
2020-08-27 07:46:50.170 [INFO][8] startup.go 715: No AS number configured on node resource, using global value
2020-08-27 07:46:50.170 [INFO][8] startup.go 171: Setting NetworkUnavailable to False
2020-08-27 07:46:50.191 [INFO][8] startup.go 764: found v6= in the kubeadm config map
2020-08-27 07:46:50.210 [INFO][8] startup.go 598: FELIX_IPV6SUPPORT is false through environment variable
2020-08-27 07:46:50.232 [INFO][8] startup.go 215: Using node name: davigar15
2020-08-27 07:46:50.274 [INFO][32] allocateip.go 144: Current address is still valid, do nothing currentAddr="10.1.245.64" type="vxlanTunnelAddress"
CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
Calico node started successfully
An interesting this if that calico is detecting the network used for LXD.
Following @ktsakalozos suggestions, I added this in /var/snap/microk8s/current/args/cni-network/cni.yaml and apply that spec.
- name: IP_AUTODETECTION_METHOD
value: "can-reach=192.168.0.0"
The calico node did not restart, so I kill it to force the restart. But it did not come up even with microk8s.stop && microk8s.start
This is the tarball generated by microk8s.inspect
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:18 (5 by maintainers)
Top Results From Across the Web
Pods stuck in unknown state after reboot · Issue #1520 - GitHub
and after rebooting the machine I'm getting all pods in Unknown state. An interesting this if that calico is detecting the network used...
Read more >Pods stuck at 'Unknown' status after node goes down - Reddit
One thing I noticed, is that when a node goes down (and the cluster reports its status as 'NotReady'), its pods get stuck...
Read more >StatefulSet: pods stuck in unknown state - Stack Overflow
The Pods running on an unreachable Node enter the 'Terminating' or 'Unknown' state after a timeout. Pods may also enter these states when...
Read more >1505687 – Pods in unknown state, cannot be forcibly deleted.
Description of problem: When trying to evacuate pods from a node that was going on maintenance, some pods were stuck in "Terminating" state....
Read more >Pod Stuck in Pending State – Runbooks - GitHub Pages
If the kubelet is not running on the node the pod has been assigned to, this error may be seen. You can check...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ktsakalozos Currently working on a fix for Juju to resolve this. Have updated lp bug.
tnx for ur reply: No, I don’t think so, since there is only one kernel on both of my Centos 7 severs which is kernel-3.10.0-1160.el7.x86_64 ; one server is connected to the internet and another is completely isolated and I have done nothing on it and after reboot unexpectedly this error happened! don’t guess it is related to kernel and containerd version incompatibility due to runc vulnerability measures …the reaseon I think so is that my docker on the same machine that microk8s (or better say containerd that running inside microk8s ) gives the error: “can’t copy bootstrap data to pipe: write init-p: broken pipe” can create container e.g :
#docker run -itd busybox:latest #docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7328b5736817 busybox:latest “sh” 3 seconds ago Up 3 seconds flamboyant_sanderson
the environement is the same, so if container gets created with docker on the same machine with 3.xx kernel and docker-ce and containerd version as following then we can eliminate the solution that says: “with upgrade of kernel version or downgrade docker & containerd version the problem will get solved!” I think this problem is related to microk8s and containerd inside it, deams like this issue is for Ubuntu as well: https://github.com/canonical/microk8s/issues/531
#uname -sr Linux 3.10.0-1160.49.1.el7.x86_64 #docker version Client: Version: 18.09.0 API version: 1.39 Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:48:22 2018 OS/Arch: linux/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 18.09.0 API version: 1.39 (minimum version 1.12) Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:19:08 2018 OS/Arch: linux/amd64 Experimental: false #yum info containerd Installed Packages Name : containerd.io Arch : x86_64 Version : 1.6.9 Release : 3.1.el7 Size : 112 M Repo : installed From repo : docker-ce-stable Summary : An industry-standard container runtime URL : https://containerd.io
#rpm -qa kernel kernel-3.10.0-1160.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64
#rpm -qa | grep -i kernel kernel-tools-libs-3.10.0-1160.49.1.el7.x86_64 kernel-tools-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.el7.x86_64 kernel-headers-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64