question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flannel on MicroK8s 1.18.9 running out of IP addresses when creating pods

See original GitHub issue

Flannel on MicroK8s 1.18.9 running out of IP address when creating a pod?

network: failed to allocate for range 0: no IP addresses available in range set: 10.1.78.1-10.1.78.254

Flannel is showing some errors connecting to etcd, but this not keeping from doing its job:

Service for snap application microk8s.daemon-flanneld
   Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-flanneld.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-09-26 00:35:20 UTC; 1 weeks 4 days ago
 Main PID: 51073 (flanneld)
    Tasks: 40 (limit: 19660)
   CGroup: /system.slice/snap.microk8s.daemon-flanneld.service
           └─51073 /snap/microk8s/1702/opt/cni/bin/flanneld --iface= --etcd-endpoints=https://127.0.0.1:12379 --etcd-cafile=/var/snap/microk8s/1702/certs/ca.crt --etcd-certfile=/var/snap/microk8s/1702/certs/server.crt --etcd-keyfil

Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842382   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:44:09 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:44:09.842435   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888018   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 06 23:45:56 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1006 23:45:56.888019   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048926   51073 main.go:421] Lease renewed, new expiration: 2020-10-08 12:35:31.04249333 +0000 UTC
Oct 07 12:35:31 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: I1007 12:35:31.048987   51073 main.go:429] Waiting for 22h59m59.993509265s to renew lease
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409586   51073 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Oct 07 16:52:36 adoagent-ADOPRTests000047 microk8s.daemon-flanneld[51073]: E1007 16:52:36.409622   51073 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF

However a microk8s.inspect shows flannel as running:

microk8s.inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running

Testing etcd shows that is functioning properly with the same credentials given to flannelD

etcdctl --endpoints https://127.0.0.1:12379 --ca-file=/var/snap/microk8s/1702/certs/ca.crt --cert-file=/var/snap/microk8s/1702/certs/server.crt  --key-file=/var/snap/microk8s/1702/certs/server.key --debug cluster-health

Cluster-Endpoints: https://127.0.0.1:12379
cURL Command: curl -X GET https://127.0.0.1:12379/v2/members
member 8e9e05c52164694d is healthy: got healthy result from https://10.3.0.121:12379
cluster is healthy

By looking into the list of IP addresses that flannelD assigns, we can see that the count is already at 255 - 1 (254). Which means it is maxed out.

ls /var/lib/cni/networks/microk8s-flannel-network | wc
    255     255    2954

However only 50 pods were running, which means flannelD is not cleaning up the IP addresses not used by running pods so that they can be re-used again.

I run this script that cleans up the IP addresses not used by a docker container.

cd /var/lib/cni/networks/microk8s-flannel-network
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -ir $hash ./ | awk -F: '{print $1}'; fi; done  | xargs rm

Running this script that cleaned up the files in /var/lib/cni/networks/microk8s-flannel-network not corresponding to a running container (ID) fixed the problem.

Any idea on what the root cause is?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
balchuacommented, Oct 14, 2020

@akanso i think its ok to use docker instead of containerd. As far as i can tell, the flannel data directory is stored in ${SNAP_COMMON}/var/lib/cni/flannel. Where $SNAP_COMMON is /var/snap/microk8s/common and not /var/lib/cni/networks/microk8s-flannel-network.

Will it be possible to attach the inspect tarball? Thanks

0reactions
stale[bot]commented, Sep 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Flannel on MicroK8s 1.18.9 running out of IP addresses ...
Flannel on MicroK8s 1.18.9 running out of IP address when creating a pod? network: failed to allocate for range 0: no IP addresses...
Read more >
Changing the pods CIDR in a MicroK8s cluster
To change the pods CIDR you need to configure kube-proxy (edit ... and replace the new IP range in. ... …then check the...
Read more >
Change log for 4.7.54
Bug 2057559: Requeue create on invalid credentials errors #254 · Full changelog ... IP address should be specified without underscore.
Read more >
System requirements - Calico - Tigera
The simplest way to provide the necessary privilege is to run Calico as root ... The IP range selected for pod IP addresses...
Read more >
K8s Admin | César D. Velandia
kubelet: runs and manages containers on node, talks to API ... creates IP address and assign to pod; IP address management (available) over...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found