question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Microk8s crashes when attempting to join cluster

See original GitHub issue

I’m attempting to set up a mixed architecture microk8s cluster. I have an x86 node that I’m using as the starting leader node and 2 rasppi4b nodes.

Attempting to join one of the pik8s nodes has failed 3 times and worked once.

The latest attempt looked like:

buntu@pidev2:~$ microk8s join 192.168.1.5:25000/<token>
Contacting cluster at 192.168.1.5
Waiting for this node to finish joining the cluster. ..  
ubuntu@pidev2:~$ microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.
ubuntu@pidev2:~$ microk8s inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

 WARNING:  IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT 
The change can be made persistent with: sudo apt-get install iptables-persistent
WARNING:  Docker is installed. 
File "/etc/docker/daemon.json" does not exist. 
You should create it and add the following lines: 
{
    "insecure-registries" : ["localhost:32000"] 
}
and then restart docker with: sudo systemctl restart docker
Building the report tarball
  Report tarball is at /var/snap/microk8s/1892/inspection-report-20210109_232609.tar.gz
ubuntu@pidev2:~$ microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

This is the second attempt on this node. The second attempt on a another rasppi node succeeded.

The nodes are identical (set up with ansible) and I’ve verified I can connect from both to the leader node on 25000. ufw is not running on any node.

I’ll attach the inspection reports as soon as I figure out ow to get githbu to let me.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

10reactions
ktsakalozoscommented, Jan 14, 2021

The command to forcefully remove a dqlite (the datastore) node from the cluster is:

/snap/microk8s/current/bin/dqlite -s file:///var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml -c /var/snap/microk8s/current/var/kubernetes/backend/cluster.crt -k /var/snap/microk8s/current/var/kubernetes/backend/cluster.key -f json k8s ".remove <node-ip-with-port-19001>"

The <node-ip-with-port-19001> is the Address you see in cat /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml.

I am interested to understand how the cluster got into this state so we can guard against it. Would anyone of you recall the exact steps you took? I am looking for a way to reproduce this problem.

1reaction
katlego-malekacommented, Jun 2, 2021

This worked for me:

firewall-cmd --add-port=19001/tcp --permanent firewall-cmd --reload

Read more comments on GitHub >

github_iconTop Results From Across the Web

Microk8s crashes when attempting to join cluster · Issue #1880
Attempting to join one of the pik8s nodes has failed 3 times and worked once. The latest attempt looked like: buntu@pidev2:~$ microk8s join...
Read more >
Troubleshooting - MicroK8s
The symptoms you may observe vary. You may experience the API server being slow, crashing or forming an unstable multi node cluster. Such...
Read more >
MicroK8s failed to join RPI cluster error code 500
I'm going to try removing the microk8s snap packages and installing a newer build. There's no problems in the network configuration, DMZ has ......
Read more >
MicroK8s HA under the hood - Kubernetes with Dqlite
MicroK8s is a k8s distribution … and achieve much more. We focus on the datastore. ... A single command to cluster microk8s join...
Read more >
Kubernetes CrashLoopBackOff: What it is, and how to fix it?
It's important to note that a CrashLoopBackOff is not the actual error that is crashing the pod. Remember that it's just displaying the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found