[BUG] Microk8s crashes when joining a node using ha-cluster
See original GitHub issueI’m not sure if the problem occurs because my master node is an Ubuntu machine and the worker is Windows 10 Enterprise (WSL enabled), but I thought this might be of interest.
Version: 1.19/stable
Steps to reproduce:
- Checking previously with microk8s status and microk8s inspect before joining the cluster, everything seems to be fine.
- Add-On ha-cluster is enabled on both the master and worker node.
- Running microk8s join x.x.x.x:25000/{TOKEN} makes microk8s crash silently.
No error message is output.
Output of microk8s status before joining:
microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
ha-cluster # Configure high availability on the current node
disabled:
ambassador # Ambassador API Gateway and Ingress
cilium # SDN, fast with full network policy
dashboard # The Kubernetes dashboard
dns # CoreDNS
fluentd # Elasticsearch-Fluentd-Kibana logging and monitoring
gpu # Automatic enablement of Nvidia CUDA
helm # Helm 2 - the package manager for Kubernetes
helm3 # Helm 3 - Kubernetes package manager
host-access # Allow Pods connecting to Host services smoothly
ingress # Ingress controller for external access
istio # Core Istio service mesh services
jaeger # Kubernetes Jaeger operator with its simple config
knative # The Knative framework on Kubernetes.
kubeflow # Kubeflow for easy ML deployments
linkerd # Linkerd is a service mesh for Kubernetes and other frameworks
metallb # Loadbalancer for your Kubernetes cluster
metrics-server # K8s Metrics Server for API access to service metrics
multus # Multus CNI enables attaching multiple network interfaces to pods
prometheus # Prometheus operator for monitoring and logging
rbac # Role-Based Access Control for authorisation
registry # Private image registry exposed on localhost:32000
storage # Storage class; allocates storage from host directory
Output of microk8s inspect before joining:
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-control-plane-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting juju
Inspect Juju
Inspecting kubeflow
Inspect Kubeflow
Output of join (finishes without further output):
Contacting cluster at 10.10.40.24
Waiting for this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..
Output of microk8s status after joining:
microk8s is not running. Use microk8s inspect for a deeper inspection.
Output of microk8s inspect after joining:
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
FAIL: Service snap.microk8s.daemon-apiserver is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-apiserver
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-control-plane-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting juju
Inspect Juju
Inspecting kubeflow
Inspect Kubeflow
Building the report tarball
Report tarball is at /var/snap/microk8s/1791/inspection-report-20201211_113621.tar.gz
An error occurred when trying to execute 'sudo microk8s.inspect' with 'multipass': returned exit code 1.
And as you can image, the node is not added on the master node.
I reinstalled microk8s & removed the VM. Then everything seems to be fine again, and after trying to join microk8s crashes again.
FAIL: Service snap.microk8s.daemon-apiserver is not running
Approximately 15 minutes later, microk8s seemed to be up running again (but the api-server was still down). After trying again to join the cluster, I’ve received a python stacktrace. Maybe just because the api-server was down, but I thought I append this here just in case.
Contacting cluster at 10.10.40.24
Traceback (most recent call last):
File "/snap/microk8s/1791/scripts/cluster/join.py", line 967, in <module>
join_dqlite(connection_parts)
File "/snap/microk8s/1791/scripts/cluster/join.py", line 900, in join_dqlite
update_dqlite(info["cluster_cert"], info["cluster_key"], info["voters"], hostname_override)
File "/snap/microk8s/1791/scripts/cluster/join.py", line 818, in update_dqlite
with open("{}/info.yaml".format(cluster_backup_dir)) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/microk8s/1791/var/kubernetes/backend.backup/info.yaml'
An error occurred when trying to execute 'sudo microk8s.join 10.10.40.24:25000/04f6ac0ea469893c594e5b30954618f0' with 'multipass': returned exit code 1.
NOTE: Resolved in the meantime by disabling the add-on ha-cluster on both nodes. Would be great if this issue could be fixed soon!
Issue Analytics
- State:
- Created 3 years ago
- Comments:24
Top GitHub Comments
Ended up using Kubernetes natively and now everything seems fine.
@devZer0 can you upload the inspect tarball? Thanks