question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Leader election and dqlite errors when recovering nodes in HA cluster

See original GitHub issue

Hello,

We have a HA cluster setup with three nodes, each using version 1.21.7:

NAME                    STATUS   ROLES    AGE   VERSION                    INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
masterA                 Ready    <none>   37m   v1.21.7-3+7700880a5c71e2   X.X.X.X          <none>        Ubuntu 18.04.6 LTS   5.4.0-79-generic   containerd://1.4.4
masterB                 Ready    <none>   32m   v1.21.7-3+7700880a5c71e2   X.X.X.X          <none>        Ubuntu 18.04.6 LTS   5.4.0-79-generic   containerd://1.4.4
masterC                 Ready    <none>   42m   v1.21.7-3+7700880a5c71e2   X.X.X.X          <none>        Ubuntu 18.04.6 LTS   5.4.0-79-generic   containerd://1.4.4

We came across an issue when two nodes masterA and masterB were removed ungracefully and shutdown. The elected leader node was masterA. The following errors occurred around the same time on the remaining node masterC:

leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=15s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
apiserver was unable to write a JSON response: http: Handler timeout
apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
apiserver was unable to write a fallback JSON response: http: Handler timeout

At this point, the elected leader node remained the same (masterA) which is still powered-off; however, when we powered-on masterB, it failed to start the kubelite service:

microk8s.daemon-kubelite[8542]: Error: start node: raft_start(): io: load closed segment 0000000001324915-0000000001325281: entries batch 52 starting at byte 1041968: data checksum mismatch

Could the segment be corrupt or would this suggest that it cannot sync the dqlite files to the elected leader masterA that is still unavailable? If so is there a way we can validate the checksum? In order to recover masterB, I had to delete the mentioned dqlite file and restart the kubelite service. Once masterB and masterC were available, a new leader node was elected (masterB) and we was able to recover the cluster.

Checking the HA documentation, with only a single node available, would this render the cluster inoperable? Essentially, would we need more than one node available at anytime?

There was a few other suggestions such as increasing the following arguments in the kube-scheduler and kube-controller-manager (source):

--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s

A number of comments mentioned that they had the same issue using microk8s v1.21. The last potential issue was a “resource crunch or network issue” mentioned here. We have not yet been able to replicate the issue but would appreciate if anyone could shed some light on this.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

6reactions
MathieuBorderecommented, Jan 10, 2022

@bc185174, the error raft_start(): io: load closed segment 0000000001324915-0000000001325281: entries batch 52 starting at byte 1041968: data checksum mismatch indicates some form of data corruption. This probably happened because of the unclean way the node was taken down. Maybe @MathieuBordere knows if there are any plans to perform some kind of (semi) automated “fsck” on the data and recover from such cases.

It’s not planned immediately but was already discussed, and imo is useful to add. Will try to do it within a reasonable timeframe.

0reactions
stale[bot]commented, Dec 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Leader election and dqlite errors when recovering ... - GitHub
Hello, We have a HA cluster setup with three nodes, each using version 1.21.7: NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP ...
Read more >
MicroK8s HA under the hood - Kubernetes with Dqlite
3. Detection of failures as they occur ... Self-healing HA cluster. DB Voter. DB Voter. DB Leader ... leader. •DC1 freezes. •Spare node...
Read more >
orchestrator/raft: Pre-Release 3.0 - Shlomi Noach
In an orchestrator/raft setup orchestrator nodes talk to each other via raft protocol, form consensus and elect a leader.
Read more >
Distributed, Fault-Tolerant SQLite Databases - nox.im
After going to the Raft log, it is applied to the SQLite database, on all the nodes in the cluster. rqlite replication, image...
Read more >
Rqlite: The lightweight, distributed relational database built on ...
It makes things so simple, and ensures the node can always recover, regardless of the prior state of the SQLite database -- basically...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found