question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pods stuck in unknown state after reboot

See original GitHub issue

I’m using latest/edge (with calico cni) and after rebooting the machine I’m getting all pods in Unknown state.

Logs of the calico node:

2020-08-27 07:46:50.152 [INFO][8] startup.go 290: Early log level set to info
2020-08-27 07:46:50.152 [INFO][8] startup.go 306: Using NODENAME environment for node name
2020-08-27 07:46:50.152 [INFO][8] startup.go 318: Determined node name: davigar15
2020-08-27 07:46:50.153 [INFO][8] startup.go 350: Checking datastore connection
2020-08-27 07:46:50.159 [INFO][8] startup.go 374: Datastore connection verified
2020-08-27 07:46:50.159 [INFO][8] startup.go 102: Datastore is ready
2020-08-27 07:46:50.170 [INFO][8] startup.go 652: Using autodetected IPv4 address on interface lxdbr0: 172.16.100.1/24
2020-08-27 07:46:50.170 [INFO][8] startup.go 715: No AS number configured on node resource, using global value
2020-08-27 07:46:50.170 [INFO][8] startup.go 171: Setting NetworkUnavailable to False
2020-08-27 07:46:50.191 [INFO][8] startup.go 764: found v6= in the kubeadm config map
2020-08-27 07:46:50.210 [INFO][8] startup.go 598: FELIX_IPV6SUPPORT is false through environment variable
2020-08-27 07:46:50.232 [INFO][8] startup.go 215: Using node name: davigar15
2020-08-27 07:46:50.274 [INFO][32] allocateip.go 144: Current address is still valid, do nothing currentAddr="10.1.245.64" type="vxlanTunnelAddress"
CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
Calico node started successfully

An interesting this if that calico is detecting the network used for LXD.

Following @ktsakalozos suggestions, I added this in /var/snap/microk8s/current/args/cni-network/cni.yaml and apply that spec.

             - name: IP_AUTODETECTION_METHOD
              value: "can-reach=192.168.0.0"

The calico node did not restart, so I kill it to force the restart. But it did not come up even with microk8s.stop && microk8s.start

This is the tarball generated by microk8s.inspect

inspection-report.zip

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:18 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tlmcommented, Oct 12, 2020

@ktsakalozos Currently working on a fix for Juju to resolve this. Have updated lp bug.

0reactions
javad87commented, Nov 8, 2022

Any Idea how can I resolve it? don’t think it is related to docker or containerd or kernel version since it was working perfectly for 7-8 monthes!

Is it possible that the reboot caused the system to start from another kernel?

tnx for ur reply: No, I don’t think so, since there is only one kernel on both of my Centos 7 severs which is kernel-3.10.0-1160.el7.x86_64 ; one server is connected to the internet and another is completely isolated and I have done nothing on it and after reboot unexpectedly this error happened! don’t guess it is related to kernel and containerd version incompatibility due to runc vulnerability measures …the reaseon I think so is that my docker on the same machine that microk8s (or better say containerd that running inside microk8s ) gives the error: “can’t copy bootstrap data to pipe: write init-p: broken pipe” can create container e.g :

#docker run -itd busybox:latest #docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7328b5736817 busybox:latest “sh” 3 seconds ago Up 3 seconds flamboyant_sanderson

the environement is the same, so if container gets created with docker on the same machine with 3.xx kernel and docker-ce and containerd version as following then we can eliminate the solution that says: “with upgrade of kernel version or downgrade docker & containerd version the problem will get solved!” I think this problem is related to microk8s and containerd inside it, deams like this issue is for Ubuntu as well: https://github.com/canonical/microk8s/issues/531

#uname -sr Linux 3.10.0-1160.49.1.el7.x86_64 #docker version Client: Version: 18.09.0 API version: 1.39 Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:48:22 2018 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 18.09.0 API version: 1.39 (minimum version 1.12) Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:19:08 2018 OS/Arch: linux/amd64 Experimental: false #yum info containerd Installed Packages Name : containerd.io Arch : x86_64 Version : 1.6.9 Release : 3.1.el7 Size : 112 M Repo : installed From repo : docker-ce-stable Summary : An industry-standard container runtime URL : https://containerd.io

#rpm -qa kernel kernel-3.10.0-1160.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64

#rpm -qa | grep -i kernel kernel-tools-libs-3.10.0-1160.49.1.el7.x86_64 kernel-tools-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.el7.x86_64 kernel-headers-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pods stuck in unknown state after reboot · Issue #1520 - GitHub
and after rebooting the machine I'm getting all pods in Unknown state. An interesting this if that calico is detecting the network used...
Read more >
Pods stuck at 'Unknown' status after node goes down - Reddit
One thing I noticed, is that when a node goes down (and the cluster reports its status as 'NotReady'), its pods get stuck...
Read more >
StatefulSet: pods stuck in unknown state - Stack Overflow
The Pods running on an unreachable Node enter the 'Terminating' or 'Unknown' state after a timeout. Pods may also enter these states when...
Read more >
1505687 – Pods in unknown state, cannot be forcibly deleted.
Description of problem: When trying to evacuate pods from a node that was going on maintenance, some pods were stuck in "Terminating" state....
Read more >
Pod Stuck in Pending State – Runbooks - GitHub Pages
If the kubelet is not running on the node the pod has been assigned to, this error may be seen. You can check...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found