Issues after members are reconfigured
See original GitHub issueI’m testing your Raft implementation and I observe issues when cluster configuration changes in runtime. The point is to simulate behavior in a Kubernetes cluster where containers are always recreated with new IP addresses.
I have modified the RaftNode project to be able to update the configuration. I introduced appsettings.json and moved the in-memory configuration from Program.cs into it. This allows me to update the config file at any time (the members
collection) and that change is picked up by all three running instances of RaftNode.
When there are no configuration changes, the cluster is very stable - I can restart any RaftNode instance (both the leader or any of the followers) and it immediately joins the cluster - if a new leader election is needed it usually happens within one term. However, when I stop a node and change its port number in the config file and then start it again with the new port, I sometimes observe these issues:
- The cluster has hard time electing a new leader - there can be many terms until the leader is established and both followers receive values. Typically after the new node joins the cluster the following repeats multiple times: there is a new election, the leader becomes one of the original two nodes, those two nodes exchange couple of values, but the new node doesn’t receive anything, it times out and calls for new election.
- This occurred only once and it may not be related to the configuration changes. Plus, it may have happened before I got the latest changes from you (I don’t remember when it happened anymore). But anyway, this is what I observed: when I once stopped the leader, it could not be restarted - every time I attempted to start it again the whole process crashed immediately in
PersistentState.ApplyAsync(long startIndex, CancellationToken token)
on the line with Debug.Fail.
I’m running this on Windows, using HTTP for communication and I use persistent storage - the node with the modified port reuses the storage.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14
It was just for tests. No specific requirements for the partition size now. I’ll publish a new version tonight.
@potrusil-osi , could you please check again?