question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Microk8s crashes when joining a node using ha-cluster

See original GitHub issue

I’m not sure if the problem occurs because my master node is an Ubuntu machine and the worker is Windows 10 Enterprise (WSL enabled), but I thought this might be of interest.

Version: 1.19/stable

Steps to reproduce:

  1. Checking previously with microk8s status and microk8s inspect before joining the cluster, everything seems to be fine.
  2. Add-On ha-cluster is enabled on both the master and worker node.
  3. Running microk8s join x.x.x.x:25000/{TOKEN} makes microk8s crash silently.

No error message is output.

Output of microk8s status before joining:

microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
addons:
  enabled:
    ha-cluster           # Configure high availability on the current node
  disabled:
    ambassador           # Ambassador API Gateway and Ingress
    cilium               # SDN, fast with full network policy
    dashboard            # The Kubernetes dashboard
    dns                  # CoreDNS
    fluentd              # Elasticsearch-Fluentd-Kibana logging and monitoring
    gpu                  # Automatic enablement of Nvidia CUDA
    helm                 # Helm 2 - the package manager for Kubernetes
    helm3                # Helm 3 - Kubernetes package manager
    host-access          # Allow Pods connecting to Host services smoothly
    ingress              # Ingress controller for external access
    istio                # Core Istio service mesh services
    jaeger               # Kubernetes Jaeger operator with its simple config
    knative              # The Knative framework on Kubernetes.
    kubeflow             # Kubeflow for easy ML deployments
    linkerd              # Linkerd is a service mesh for Kubernetes and other frameworks
    metallb              # Loadbalancer for your Kubernetes cluster
    metrics-server       # K8s Metrics Server for API access to service metrics
    multus               # Multus CNI enables attaching multiple network interfaces to pods
    prometheus           # Prometheus operator for monitoring and logging
    rbac                 # Role-Based Access Control for authorisation
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory

Output of microk8s inspect before joining:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

Output of join (finishes without further output):

Contacting cluster at 10.10.40.24
Waiting for this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..

Output of microk8s status after joining:

microk8s is not running. Use microk8s inspect for a deeper inspection.

Output of microk8s inspect after joining:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
 FAIL:  Service snap.microk8s.daemon-apiserver is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-apiserver
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

Building the report tarball
  Report tarball is at /var/snap/microk8s/1791/inspection-report-20201211_113621.tar.gz
An error occurred when trying to execute 'sudo microk8s.inspect' with 'multipass': returned exit code 1.

And as you can image, the node is not added on the master node.

I reinstalled microk8s & removed the VM. Then everything seems to be fine again, and after trying to join microk8s crashes again.

FAIL: Service snap.microk8s.daemon-apiserver is not running

Approximately 15 minutes later, microk8s seemed to be up running again (but the api-server was still down). After trying again to join the cluster, I’ve received a python stacktrace. Maybe just because the api-server was down, but I thought I append this here just in case.

Contacting cluster at 10.10.40.24
Traceback (most recent call last):
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 967, in <module>
    join_dqlite(connection_parts)
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 900, in join_dqlite
    update_dqlite(info["cluster_cert"], info["cluster_key"], info["voters"], hostname_override)
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 818, in update_dqlite
    with open("{}/info.yaml".format(cluster_backup_dir)) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/microk8s/1791/var/kubernetes/backend.backup/info.yaml'
An error occurred when trying to execute 'sudo microk8s.join 10.10.40.24:25000/04f6ac0ea469893c594e5b30954618f0' with 'multipass': returned exit code 1.

NOTE: Resolved in the meantime by disabling the add-on ha-cluster on both nodes. Would be great if this issue could be fixed soon!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:24

github_iconTop GitHub Comments

3reactions
wsdtcommented, Dec 26, 2020

Ended up using Kubernetes natively and now everything seems fine.

1reaction
balchuacommented, Mar 17, 2021

@devZer0 can you upload the inspect tarball? Thanks

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Microk8s crashes when joining a node using ha-cluster
Running microk8s join x.x.x.x:25000/{TOKEN} makes microk8s crash silently. No error message is output. Output of microk8s status before joining:.
Read more >
Troubleshooting - MicroK8s
You may experience the API server being slow, crashing or forming an unstable multi node cluster. Such problems are often traced to low...
Read more >
MicroK8s failed to join RPI cluster error code 500
I can issue the add node command on the master node fine, the join command I paste into one of the leaf nodes...
Read more >
Ubuntu HA - Introduction | Ubuntu
Without clustering, if a server running a particular application crashes, ... If that happens, each node in the cluster may mistakenly decide that...
Read more >
MicroK8s networking broke randomly : r/kubernetes - Reddit
So early last week the networking in my single node MicroK8s instance ... dns, fluentd, ha-cluster, helm3, ingress, metrics-server, storage.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found