Leader election and dqlite errors when recovering nodes in HA cluster
See original GitHub issueHello,
We have a HA cluster setup with three nodes, each using version 1.21.7
:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
masterA Ready <none> 37m v1.21.7-3+7700880a5c71e2 X.X.X.X <none> Ubuntu 18.04.6 LTS 5.4.0-79-generic containerd://1.4.4
masterB Ready <none> 32m v1.21.7-3+7700880a5c71e2 X.X.X.X <none> Ubuntu 18.04.6 LTS 5.4.0-79-generic containerd://1.4.4
masterC Ready <none> 42m v1.21.7-3+7700880a5c71e2 X.X.X.X <none> Ubuntu 18.04.6 LTS 5.4.0-79-generic containerd://1.4.4
We came across an issue when two nodes masterA
and masterB
were removed ungracefully and shutdown. The elected leader node was masterA
. The following errors occurred around the same time on the remaining node masterC
:
leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=15s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
apiserver was unable to write a JSON response: http: Handler timeout
apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
apiserver was unable to write a fallback JSON response: http: Handler timeout
At this point, the elected leader node remained the same (masterA
) which is still powered-off; however, when we powered-on masterB
, it failed to start the kubelite service:
microk8s.daemon-kubelite[8542]: Error: start node: raft_start(): io: load closed segment 0000000001324915-0000000001325281: entries batch 52 starting at byte 1041968: data checksum mismatch
Could the segment be corrupt or would this suggest that it cannot sync the dqlite files to the elected leader masterA
that is still unavailable? If so is there a way we can validate the checksum? In order to recover masterB
, I had to delete the mentioned dqlite file and restart the kubelite service. Once masterB
and masterC
were available, a new leader node was elected (masterB
) and we was able to recover the cluster.
Checking the HA documentation, with only a single node available, would this render the cluster inoperable? Essentially, would we need more than one node available at anytime?
There was a few other suggestions such as increasing the following arguments in the kube-scheduler and kube-controller-manager (source):
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s
A number of comments mentioned that they had the same issue using microk8s v1.21. The last potential issue was a “resource crunch or network issue” mentioned here. We have not yet been able to replicate the issue but would appreciate if anyone could shed some light on this.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
It’s not planned immediately but was already discussed, and imo is useful to add. Will try to do it within a reasonable timeframe.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.