Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

short lived leader election reported when starting a raft cluster

See original GitHub issue

The issue can be reproduced with the example project, by building it locally and adding a cluster.bat to the output folder that has the following lines:

START /B .\RaftNode.exe tcp 3262 node2 > node2.log
START /B .\RaftNode.exe tcp 3263 node3 > node3.log
START /B .\RaftNode.exe tcp 3264 node4 > node4.log

Run: del -r node* && .\cluster.bat

I can reproduce it with 2-6 attempts in a windows x64 machine. The issue was originally found in a raspberry pi (arm - linux).

The leader prints this its log:

New cluster leader is elected. Leader address is 127.0.0.1:3260
Term of local cluster member is 1. Election timeout 00:00:00.1590000
Consensus cannot be reached
Term of local cluster member is 1. Election timeout 00:00:00.1590000
New cluster leader is elected. Leader address is 127.0.0.1:3267
Term of local cluster member is 2. Election timeout 00:00:00.1590000
Accepting value 500
Accepting value 1000
Accepting value 1500
Accepting value 2000
...

Other nodes print (+ the leader prints the save messages)

New cluster leader is elected. Leader address is 127.0.0.1:3267
Term of local cluster member is 2. Election timeout 00:00:00.1700000
Accepting value 500
Accepting value 1000
Accepting value 1500
Accepting value 2000
...

When done with the run, the RaftNode processes need to be killed via task manager since they are running in the background.

_Originally posted by @freddyrios in https://github.com/dotnet/dotNext/discussions/167#discussioncomment-6062222_

Issue Analytics

State:
Created 4 months ago
Comments:24 (11 by maintainers)

Top GitHub Comments

1reaction

freddyrioscommented, Jun 7, 2023

works great, thanks!

ran the example reproduction at least 15 times and in most cases it properly elects the leader in term 1. The other case was when all nodes became candidates close to each others and rejected each others votes and then elected a leader in term 2 (as expected in raft).

1reaction

freddyrioscommented, Jun 6, 2023

FYI I pulled the latest develop after my last message and reproduced it again.