Flannel on MicroK8s 1.18.9 running out of IP addresses when creating pods
See original GitHub issueFlannel on MicroK8s 1.18.9 running out of IP address when creating a pod?
network: failed to allocate for range 0: no IP addresses available in range set: 10.1.78.1-10.1.78.254
Flannel is showing some errors connecting to etcd, but this not keeping from doing its job:
Service for snap application microk8s.daemon-flanneld
Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-flanneld.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-09-26 00:35:20 UTC; 1 weeks 4 days ago
Main PID: 51073 (flanneld)
Tasks: 40 (limit: 19660)
CGroup: /system.slice/snap.microk8s.daemon-flanneld.service
└─51073 /snap/microk8s/1702/opt/cni/bin/flanneld --iface= --etcd-endpoints=https://127.0.0.1:12379 --etcd-cafile=/var/snap/microk8s/1702/certs/ca.crt --etcd-certfile=/var/snap/microk8s/1702/certs/server.crt --etcd-keyfil
Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842382 51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842435 51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888018 51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888019 51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048926 51073 main.go:421] Lease renewed, new expiration: 2020-10-08 12:35:31.04249333 +0000 UTC
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048987 51073 main.go:429] Waiting for 22h59m59.993509265s to renew lease
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409586 51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409622 51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
However a microk8s.inspect
shows flannel as running:
microk8s.inspect
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Testing etcd shows that is functioning properly with the same credentials given to flannelD
etcdctl --endpoints https://127.0.0.1:12379 --ca-file=/var/snap/microk8s/1702/certs/ca.crt --cert-file=/var/snap/microk8s/1702/certs/server.crt --key-file=/var/snap/microk8s/1702/certs/server.key --debug cluster-health
Cluster-Endpoints: https://127.0.0.1:12379
cURL Command: curl -X GET https://127.0.0.1:12379/v2/members
member 8e9e05c52164694d is healthy: got healthy result from https://10.3.0.121:12379
cluster is healthy
By looking into the list of IP addresses that flannelD assigns, we can see that the count is already at 255 - 1 (254). Which means it is maxed out.
ls /var/lib/cni/networks/microk8s-flannel-network | wc
255 255 2954
However only 50 pods were running
, which means flannelD is not cleaning up the IP addresses not used by running pods so that they can be re-used again.
I run this script that cleans up the IP addresses not used by a docker container.
cd /var/lib/cni/networks/microk8s-flannel-network
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -ir $hash ./ | awk -F: '{print $1}'; fi; done | xargs rm
Running this script that cleaned up the files in /var/lib/cni/networks/microk8s-flannel-network
not corresponding to a running container (ID) fixed the problem.
Any idea on what the root cause is?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5
Top GitHub Comments
@akanso i think its ok to use docker instead of containerd. As far as i can tell, the flannel data directory is stored in
${SNAP_COMMON}/var/lib/cni/flannel
. Where$SNAP_COMMON
is/var/snap/microk8s/common
and not/var/lib/cni/networks/microk8s-flannel-network
.Will it be possible to attach the inspect tarball? Thanks
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.