Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request - cordon one node at a time instead of all nodes

See original GitHub issue

with RUN_MODE=1 all old nodes are cordoned at a same time, which makes AWS ELB to mark old nodes out of service, if new nodes sometimes take time to be in service then no healthy instances are left for sometime which causes outage

we tried cordnoning 1 node at a time and didnt see this issue, downside of this is that a pod may bounce multiple times because it may land up on old node because not all old nodes are cordoned, some people will be fine with bouncing one of a pod multiple times among multiple replicas.

can we have RUN_MODE 5 which is same as RUN_MODE 1 except it “cordon 1 node --> drain 1 node --> delete 1 node” at a time instead of “cordon all nodes --> drain 1 node --> delete 1 node”

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

infa-ddeorecommented, Feb 5, 2021

thanks for the explanation, it does what we expect exactly 😃 I am closing this request as TAINT_NODES=true option does exactly what we want

0reactions

chadlwilsoncommented, Feb 5, 2021

It doesn’t do any cordoning - it’s an alternative strategy.

Interacting with LBs isn’t the purpose of cordoning to my knowledge - cordoning is about preventing scheduling of new workloads - the effect on the service-managed LBs is an unintended side effect which I believe is why they have removed it in Kubernetes 1.19.

The tool uses terminate_instance_in_auto_scaling_group to orchestrate termination of instances in an ASG-aware fashion, and thus ensure your target group deregistration delay is respected; allowing any remaining traffic to drain off the instance before it is actually terminated.

Perhaps you can try it out - I think you will find it does what you expect 😃