Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(DotNext.Net.Cluster) Leader does not receive IRaftCluster.LeaderChanged event when downgrading to follower

See original GitHub issue

Hello,

I am currently trying to run the 4.2.0-beta.1 version and I have come across a specific issue regarding the log entry writing with Raft and the events associated with it

Have a single node start a standalone cluster (first node)
Add a second member with AddMember (second node)
Kill the process of node 2
Start the process of node 2 again

After step 3 and inspecting my logs, it seems like the leader steps down to follower but does not fire the IRaftCluster.LeaderChanged event. IRaftCluster.Members will still report the local (Remote == false) node 1 as leader on node 1, but attempting to write to the log now will result in

      System.InvalidOperationException: The local cluster member is not a leader
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 272
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 279
         at DotNext.Net.Cluster.Consensus.Raft.LeaderState.ReplicationCallback.Invoke() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/LeaderState.Replication.cs:line 177
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 818

IClusterMember.MemberStatusChanged will correctly fire and set node 2 to Unavailable during step 3. This behavior was not present before the upgrade to 4.2.0-beta.1.

After step 4 is done, (on node 1) IClusterMember.MemberStatusChanged will correctly fire and mark node 2 as available again. However, node 1 is still unable to write to the log as it seemingly isn’t leader anymore. Only some time after this step, node 1 will correctly fire IRaftCluster.LeaderChanged to NO LEADER, and then fire it again but WITH A LEADER.

Issue Analytics

State:
Created 2 years ago
Comments:8

Top GitHub Comments

1reaction

RyanTTcommented, Jan 25, 2022

I am using HTTP transport.

0reactions

saknocommented, May 2, 2022

Closing this issue due to lack of activity.

Top Results From Across the Web

Raft | .NEXT

Read operation can be performed on leader or follower nodes. IRaftCluster.Lease property exposes leadership lease than quarantees that the leader cannot be ...