question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues after members are reconfigured

See original GitHub issue

I’m testing your Raft implementation and I observe issues when cluster configuration changes in runtime. The point is to simulate behavior in a Kubernetes cluster where containers are always recreated with new IP addresses.

I have modified the RaftNode project to be able to update the configuration. I introduced appsettings.json and moved the in-memory configuration from Program.cs into it. This allows me to update the config file at any time (the members collection) and that change is picked up by all three running instances of RaftNode.

When there are no configuration changes, the cluster is very stable - I can restart any RaftNode instance (both the leader or any of the followers) and it immediately joins the cluster - if a new leader election is needed it usually happens within one term. However, when I stop a node and change its port number in the config file and then start it again with the new port, I sometimes observe these issues:

  1. The cluster has hard time electing a new leader - there can be many terms until the leader is established and both followers receive values. Typically after the new node joins the cluster the following repeats multiple times: there is a new election, the leader becomes one of the original two nodes, those two nodes exchange couple of values, but the new node doesn’t receive anything, it times out and calls for new election.
  2. This occurred only once and it may not be related to the configuration changes. Plus, it may have happened before I got the latest changes from you (I don’t remember when it happened anymore). But anyway, this is what I observed: when I once stopped the leader, it could not be restarted - every time I attempted to start it again the whole process crashed immediately in PersistentState.ApplyAsync(long startIndex, CancellationToken token) on the line with Debug.Fail.

I’m running this on Windows, using HTTP for communication and I use persistent storage - the node with the modified port reuses the storage.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14

github_iconTop GitHub Comments

1reaction
saknocommented, Dec 16, 2020

It was just for tests. No specific requirements for the partition size now. I’ll publish a new version tonight.

1reaction
saknocommented, Dec 16, 2020

@potrusil-osi , could you please check again?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reconfigure a Replica Set with Unavailable Members
To reconfigure a replica set when a majority of members are available, use the rs.reconfig() operation on the current primary, following the example...
Read more >
Runtime reconfiguration
This step by step approach is very important because if newly added members is not configured correctly (for example the peer URLs are...
Read more >
Restructure or Reconfigure?
To cope with ever-changing market conditions, companies often have to reorganize. But leaders tend to get conflicting advice about when and how to...
Read more >
added member reconfiguration fail · Issue #3267 · etcd-io ...
hi, I'm trying to add a node to a cluster (failed to find answer for the reconfiguration part of the issue). nodes :...
Read more >
A Reconfigured U.S. Supreme Court: Implications for ...
This issue brief considers the potential implications of a reconfigured Court for health policy issues, including those already on…More.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found