question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(DotNext.Net.Cluster) Leader does not receive IRaftCluster.LeaderChanged event when downgrading to follower

See original GitHub issue

Hello,

I am currently trying to run the 4.2.0-beta.1 version and I have come across a specific issue regarding the log entry writing with Raft and the events associated with it

  1. Have a single node start a standalone cluster (first node)
  2. Add a second member with AddMember (second node)
  3. Kill the process of node 2
  4. Start the process of node 2 again

After step 3 and inspecting my logs, it seems like the leader steps down to follower but does not fire the IRaftCluster.LeaderChanged event. IRaftCluster.Members will still report the local (Remote == false) node 1 as leader on node 1, but attempting to write to the log now will result in

      System.InvalidOperationException: The local cluster member is not a leader
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 272
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 279
         at DotNext.Net.Cluster.Consensus.Raft.LeaderState.ReplicationCallback.Invoke() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/LeaderState.Replication.cs:line 177
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 818

IClusterMember.MemberStatusChanged will correctly fire and set node 2 to Unavailable during step 3. This behavior was not present before the upgrade to 4.2.0-beta.1.

After step 4 is done, (on node 1) IClusterMember.MemberStatusChanged will correctly fire and mark node 2 as available again. However, node 1 is still unable to write to the log as it seemingly isn’t leader anymore. Only some time after this step, node 1 will correctly fire IRaftCluster.LeaderChanged to NO LEADER, and then fire it again but WITH A LEADER.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
RyanTTcommented, Jan 25, 2022

I am using HTTP transport.

0reactions
saknocommented, May 2, 2022

Closing this issue due to lack of activity.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Raft | .NEXT
Read operation can be performed on leader or follower nodes. IRaftCluster.Lease property exposes leadership lease than quarantees that the leader cannot be ...
Read more >
Clustered ASP.NET Core Microservices | .NEXT
Therefore, you can loose some events raised by the service such as LeaderChanged at starting point. To avoid that, you can implement IClusterMemberLifetime ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found