question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

desiredSize in createManagedNodeGroup can conflict with Kubernetes cluster-autoscaler

See original GitHub issue

Problem description

The @pulumi/eks module includes a function, createManagedNodeGroup. This is a function that is invoked to create underlying resources that make up a “node group” for EKS.

There are 3 required scale-related values for a nodeGroup - (scalingConfig):

desiredSize: 3, minSize: 3, maxSize: 20

Per best-practice the cluster-autoscaler helm chart is installed into the EKS cluster to manage node (ASG) scaleup/down based on pod workloads.

Affected product version(s)

Current @pulumi/eks release - 0.18.19

Reproducing the issue

Create a cluster and node group using the above scaling configuration: desiredSize: 3, minSize: 3, maxSize: 20

Deploy the cluster-autoscaler helm chart and associated IAM roles. Use default scaling strategy (random).
A cluster will be deployed with 3 nodes.
Deploy a sample nginx app and scale it to 120 replicas. Run watch kubectl get nodes You will see new nodes come online to support the resource requests. At this point the actual desiredSize in the ASG will not match config. Run pulumi refresh - an update to the scalingConfig will take place.
Let’s assume the nodeGroup is sitting at 12 nodes.

Modify the scaling configuration: desiredSize: 6, minSize: 3, maxSize: 20

Run pulumi up. The desiredSize will now be set to 6. Run the same watch kubectl get nodes. You will see nodes being destroyed and workloads will go into a NotReady state. Eventually the cluster-autoscaler will step in and scale up the node count.

Suggestions for a fix

Our work-around right now is to set a fairly high value, near or at the ‘max’ setting. In practice, each time this stack runs, we may end up provisioning more nodes than we need. In testing we end up with excess nodes for ~12 minutes. The autoscaler then takes over and provisions new nodes. This causes degradation for 10-15 minutes.

What we’d like to be able to do is either a) have desiredSize be optional. This is not a particularly useful setting when using EKS and the cluster-autoscaler b) ignoreChanges on that attribute.

Due to createManagedNodeGroup being a function that creates Pulumi objects but is not itself a Pulumi resource, we can’t effectively set that value for that attribute.

Example snippet of cluster creation

    // Grab the cluster name
    this.ClusterName = this.eksCluster.eksCluster.name;

    // Create managed node groups
    this.nodeGroups = clusterOpts.nodeGroups.map(ng => {
      return eks.createManagedNodeGroup(
        // Name
        ng.nodeGroupName.toString(),
        // ManagedNodeGroupOptions
        {
          nodeRoleArn: GetARN(ng.role),
          cluster: this.eksCluster,
          subnetIds: privateSubnetIds,
          ...ng
        },
        // Pulumi parent
        this.eksCluster
      );
    });

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
abatilocommented, Sep 4, 2021

The Hashicorp terraform-aws-eks module automatically sets ignoring desired_size: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/013afb0cc6ff6178d72ec8639827ee023e61753b/modules/node_groups/node_groups.tf#L102

Maybe we could take some inspiration and make this configurable somehow?

1reaction
casey-robertsoncommented, Mar 20, 2020

Yep! Good to go.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cluster-Autoscaler - EKS Best Practices Guides
The Kubernetes Cluster Autoscaler is a popular Cluster Autoscaling ... EC2 Auto Scaling Groups can be used as an implementation of Node Groups...
Read more >
Does setting "desired size: 0" prevent cluster-autoscaler ...
For the cluster-autoscaler to scale up a node group from 0, it constructs a pseudo node based on the nodegroup specifications, in this...
Read more >
Autoscaling - Amazon EKS - AWS Documentation
The Kubernetes Cluster Autoscaler and the Karpenter open source autoscaling project. The cluster autoscaler uses AWS scaling groups, while Karpenter works ...
Read more >
aws_eks_node_group | Resources | hashicorp/aws
Manages an EKS Node Group, which can provision and optionally update an Auto Scaling Group of Kubernetes worker nodes compatible with EKS.
Read more >
Upgrade Managed Node Group
Unless you have done that chapter in the workshop and left it deployed you can skip this step. kubectl scale deployments/cluster-autoscaler --replicas=0 -n...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found