question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request - cordon one node at a time instead of all nodes

See original GitHub issue

with RUN_MODE=1 all old nodes are cordoned at a same time, which makes AWS ELB to mark old nodes out of service, if new nodes sometimes take time to be in service then no healthy instances are left for sometime which causes outage

we tried cordnoning 1 node at a time and didnt see this issue, downside of this is that a pod may bounce multiple times because it may land up on old node because not all old nodes are cordoned, some people will be fine with bouncing one of a pod multiple times among multiple replicas.

can we have RUN_MODE 5 which is same as RUN_MODE 1 except it “cordon 1 node --> drain 1 node --> delete 1 node” at a time instead of “cordon all nodes --> drain 1 node --> delete 1 node”

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
infa-ddeorecommented, Feb 5, 2021

thanks for the explanation, it does what we expect exactly 😃 I am closing this request as TAINT_NODES=true option does exactly what we want

0reactions
chadlwilsoncommented, Feb 5, 2021

It doesn’t do any cordoning - it’s an alternative strategy.

Interacting with LBs isn’t the purpose of cordoning to my knowledge - cordoning is about preventing scheduling of new workloads - the effect on the service-managed LBs is an unintended side effect which I believe is why they have removed it in Kubernetes 1.19.

The tool uses terminate_instance_in_auto_scaling_group to orchestrate termination of instances in an ASG-aware fashion, and thus ensure your target group deregistration delay is respected; allowing any remaining traffic to drain off the instance before it is actually terminated.

Perhaps you can try it out - I think you will find it does what you expect 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nodes - Kubernetes
Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending...
Read more >
Draining or Cordoning a Node - Tencent Cloud
You can cordon a node with one of the following two methods: ... After the node is drained, all Pods (excluding those managed...
Read more >
Managed node groups - Amazon EKS - AWS Documentation
Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.
Read more >
Drain a node on the swarm - Docker Documentation
The swarm manager can assign tasks to any ACTIVE node, so up to now all nodes have been available to receive tasks. Sometimes,...
Read more >
Nodes and Node Pools | Rancher Manager
Rather than using the Rancher UI to make edits such as scaling the number of nodes up or down, edit the cluster directly....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found