Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Microk8s crashes when attempting to join cluster

See original GitHub issue

I’m attempting to set up a mixed architecture microk8s cluster. I have an x86 node that I’m using as the starting leader node and 2 rasppi4b nodes.

Attempting to join one of the pik8s nodes has failed 3 times and worked once.

The latest attempt looked like:

buntu@pidev2:~$ microk8s join 192.168.1.5:25000/<token>
Contacting cluster at 192.168.1.5
Waiting for this node to finish joining the cluster. ..  
ubuntu@pidev2:~$ microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.
ubuntu@pidev2:~$ microk8s inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

 WARNING:  IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT 
The change can be made persistent with: sudo apt-get install iptables-persistent
WARNING:  Docker is installed. 
File "/etc/docker/daemon.json" does not exist. 
You should create it and add the following lines: 
{
    "insecure-registries" : ["localhost:32000"] 
}
and then restart docker with: sudo systemctl restart docker
Building the report tarball
  Report tarball is at /var/snap/microk8s/1892/inspection-report-20210109_232609.tar.gz
ubuntu@pidev2:~$ microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

This is the second attempt on this node. The second attempt on a another rasppi node succeeded.

The nodes are identical (set up with ansible) and I’ve verified I can connect from both to the leader node on 25000. ufw is not running on any node.

I’ll attach the inspection reports as soon as I figure out ow to get githbu to let me.

Issue Analytics

State:
Created 3 years ago
Comments:16 (3 by maintainers)

Top GitHub Comments

10reactions

ktsakalozoscommented, Jan 14, 2021

The command to forcefully remove a dqlite (the datastore) node from the cluster is:

/snap/microk8s/current/bin/dqlite -s file:///var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml -c /var/snap/microk8s/current/var/kubernetes/backend/cluster.crt -k /var/snap/microk8s/current/var/kubernetes/backend/cluster.key -f json k8s ".remove <node-ip-with-port-19001>"

The <node-ip-with-port-19001> is the Address you see in cat /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml.

I am interested to understand how the cluster got into this state so we can guard against it. Would anyone of you recall the exact steps you took? I am looking for a way to reproduce this problem.

1reaction

katlego-malekacommented, Jun 2, 2021

This worked for me:

firewall-cmd --add-port=19001/tcp --permanent firewall-cmd --reload