possible simpler design?
See original GitHub issueHello @donbeave @kikov79 @harai @cvallance @ericln ,
Awesome project… I’ve been looking at setting up MongoDB with replica sets on Kubernetes, and I found this project, so I’m pretty stoked.
Yet, I’m a little weary of deploying (yet another) NodeJS application in my cluster… It seems a little overkill to run this as a sidecar permanently, so I wanted to pick your brain(s) about another approach:
you’ve obviously spent a bunch of time on this, so you might just be able to point the flaws in the concept right away before I spend too much time working on it.
Here are the assumptions:
- A replicaSets need to be initialized on the master node, and master node only.
- MongoDB pods are deployed with a ReplicationController, so if a pod dies, it gets replaced, so running a script on post start can check for pods that are gone and pods that came in to reconfigure the master.
- a Headless service provides a list of IPs for the pods references by the mongoDB label.
I’m thinking all this can easily be done with a shell script:
- Looking up pods with the apiserver is just a
curl
command. I use this all the time and usejq
to parse the JSON output. - Finding the master (or lack thereof on bootstrap) can be done with the
mongo
shell as command line.
So the idea is the following:
on container lifecycle postStart
hook, run a shell script in the pod that will:
- lookup IPs of all mongoDB pods.
- run mongo shell
rs.status()
command on each mongo pod to find if there is a master, or an initialized replicaSet already. -> if there is a master, list all pods in the replica set from the mongo shellrs.conf()
command on the master, and add/remove pods according to what the Kubernetesapiserver
pod list provides. -> if there is no master,rs.initiate()
the current pod as master, and add the other pods as replicas.
Obviously, this could cause problems if a ReplicationController starts many pods at once on bootstrap (race condition to become the master and create the replicaSet) One could acquire a lock by setting a key in etcd, but that makes things more complicated. If the process is assumed to bootstrap one mongo pod first, then scale, that seems very reasonable to me.
When a pod dies, the RC will restart it, and reconfig will happen on postStart, so there doesn’t seem to need to be a worker running at all times checking the state of the cluster: if a node is gone, it will reconnect, rejoin, and clear up its old IP.
Left over removed pods may only be a problem on scaling down. A similar reconfig script could be ran on the container lifecycle preStop
hook.
What are your thoughts?
Cheers
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (2 by maintainers)
PetSets should be able to help with this.
Closing this as an issue.