question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[aws-eks] Stack breaks when upgrading an EKS Cluster

See original GitHub issue

When upgrading an eks cluster, the cloudformation stack breaks, i.e. becomes in a failed state and can not be restored anymore. I’m managing my own EC2 nodes.

Reproduction Steps

Here’s a code example of how to cause this issue:


export class EKSCluster extends cdk.Stack {
   public readonly eksCluster: eks.Cluster

   constructor(app: cdk.App, id: string, props?: cdk.StackProps) {
       super(app, id, props);
       const clusterVersion = "1.14"
       const workerNodesVersion = "1.14"

       const vpc = new ec2.Vpc(this, 'VPC')
       this.eksCluster = new eks.Cluster(this, 'Cluster', {
           defaultCapacity: 0,
           version: clusterVersion,
           vpc: vpc
       });

       const onDemandASG = new autoscaling.AutoScalingGroup(this, 'OnDemandASG', {
           vpc: vpc,
           minCapacity: 2,
           maxCapacity: 10,
           instanceType: new ec2.InstanceType('m5.xlarge'),
           machineImage: new eks.EksOptimizedImage({
               kubernetesVersion: workerNodesVersion,
               nodeType: eks.NodeType.STANDARD  // wihtout this, incorrect SSM parameter for AMI is resolved
           }),
           updateType: autoscaling.UpdateType.ROLLING_UPDATE,
           rollingUpdateConfiguration: {
             maxBatchSize: 1,
             minInstancesInService: 2,
             waitOnResourceSignals: true,
             pauseTime: cdk.Duration.minutes(1),
             minSuccessfulInstancesPercent: 100
           }
       });
       this.eksCluster.addAutoScalingGroup(onDemandASG, {
           bootstrapEnabled: true,
           mapRole: true
       })
   }
}

Now do the following:

  1. Set workerNodesVersion to 1.15
  2. Deploy the stack. It will succeed. Stack is still good.
  3. Set clusterVersion to 1.15
  4. Deploy the stack. An error will happen and the stack won’t be able to roll back to 1.14 since the stack is already on 1.15

Error Log

Error from cfn:

CustomResource attribute error: Vendor response doesn't contain CertificateAuthorityData key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1

CustomResource attribute error: Vendor response doesn't contain Arn key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1

CustomResource attribute error: Vendor response doesn't contain Endpoint key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1

The following resource(s) failed to update: [ClusterEndpoint352C929A, ClusterCertificate0B8F68BF, ClusterArnSSM9C28FFC5]. 

Environment

  • **CLI Version :1.36.0
  • Framework Version:
  • OS :
  • **Language :Typescript

Other


This is 🐛 Bug Report

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
eladbcommented, May 11, 2020

LOL, eventually we’ll turn this module into a decent thing.

0reactions
moatazelmasry2commented, May 11, 2020

@eladb sorry got to test this first yesterday. It worked great!!! So thank you for that, now it is possible to do a cluster upgrade without breaking the stack AND without cfn returning too early. Good work!!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot upgrade fails with my Amazon EKS cluster
To resolve a failed Amazon EKS cluster update, try the following: For an IpNotAvailable error, verify that the subnet that's associated with your...
Read more >
Amazon EKS Kubernetes versions - AWS Documentation
As new Kubernetes versions become available in Amazon EKS, we recommend that you proactively update your clusters to use the latest available version....
Read more >
Planning Kubernetes Upgrades with Amazon EKS | Containers
Amazon EKS offers highly available upgrades for your cluster control plane, managed node groups, and select operational add-ons.
Read more >
Updating an Amazon EKS cluster Kubernetes version
If any of these checks fail, Amazon EKS reverts the infrastructure deployment, and your cluster remains on the prior Kubernetes version. Running applications ......
Read more >
Amazon EKS platform versions
If your cluster is more than two platform versions behind the current platform version, then it's possible that Amazon EKS wasn't able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found