Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flannel on MicroK8s 1.18.9 running out of IP addresses when creating pods

See original GitHub issue

Flannel on MicroK8s 1.18.9 running out of IP address when creating a pod?

network: failed to allocate for range 0: no IP addresses available in range set: 10.1.78.1-10.1.78.254

Flannel is showing some errors connecting to etcd, but this not keeping from doing its job:

Service for snap application microk8s.daemon-flanneld
   Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-flanneld.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-09-26 00:35:20 UTC; 1 weeks 4 days ago
 Main PID: 51073 (flanneld)
    Tasks: 40 (limit: 19660)
   CGroup: /system.slice/snap.microk8s.daemon-flanneld.service
           └─51073 /snap/microk8s/1702/opt/cni/bin/flanneld --iface= --etcd-endpoints=https://127.0.0.1:12379 --etcd-cafile=/var/snap/microk8s/1702/certs/ca.crt --etcd-certfile=/var/snap/microk8s/1702/certs/server.crt --etcd-keyfil

Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842382   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842435   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888018   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888019   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048926   51073 main.go:421] Lease renewed, new expiration: 2020-10-08 12:35:31.04249333 +0000 UTC
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048987   51073 main.go:429] Waiting for 22h59m59.993509265s to renew lease
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409586   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409622   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF

However a microk8s.inspect shows flannel as running:

microk8s.inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running

Testing etcd shows that is functioning properly with the same credentials given to flannelD

etcdctl --endpoints https://127.0.0.1:12379 --ca-file=/var/snap/microk8s/1702/certs/ca.crt --cert-file=/var/snap/microk8s/1702/certs/server.crt  --key-file=/var/snap/microk8s/1702/certs/server.key --debug cluster-health

Cluster-Endpoints: https://127.0.0.1:12379
cURL Command: curl -X GET https://127.0.0.1:12379/v2/members
member 8e9e05c52164694d is healthy: got healthy result from https://10.3.0.121:12379
cluster is healthy

By looking into the list of IP addresses that flannelD assigns, we can see that the count is already at 255 - 1 (254). Which means it is maxed out.

ls /var/lib/cni/networks/microk8s-flannel-network | wc
    255     255    2954

However only 50 pods were running, which means flannelD is not cleaning up the IP addresses not used by running pods so that they can be re-used again.

I run this script that cleans up the IP addresses not used by a docker container.

cd /var/lib/cni/networks/microk8s-flannel-network
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -ir $hash ./ | awk -F: '{print $1}'; fi; done  | xargs rm

Running this script that cleaned up the files in /var/lib/cni/networks/microk8s-flannel-network not corresponding to a running container (ID) fixed the problem.

Any idea on what the root cause is?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:5

Top GitHub Comments

1reaction

balchuacommented, Oct 14, 2020

@akanso i think its ok to use docker instead of containerd. As far as i can tell, the flannel data directory is stored in ${SNAP_COMMON}/var/lib/cni/flannel. Where $SNAP_COMMON is /var/snap/microk8s/common and not /var/lib/cni/networks/microk8s-flannel-network.

Will it be possible to attach the inspect tarball? Thanks

0reactions

stale[bot]commented, Sep 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.