[aws-eks] Stack breaks when upgrading an EKS Cluster
See original GitHub issueWhen upgrading an eks cluster, the cloudformation stack breaks, i.e. becomes in a failed state and can not be restored anymore. I’m managing my own EC2 nodes.
Reproduction Steps
Here’s a code example of how to cause this issue:
export class EKSCluster extends cdk.Stack {
public readonly eksCluster: eks.Cluster
constructor(app: cdk.App, id: string, props?: cdk.StackProps) {
super(app, id, props);
const clusterVersion = "1.14"
const workerNodesVersion = "1.14"
const vpc = new ec2.Vpc(this, 'VPC')
this.eksCluster = new eks.Cluster(this, 'Cluster', {
defaultCapacity: 0,
version: clusterVersion,
vpc: vpc
});
const onDemandASG = new autoscaling.AutoScalingGroup(this, 'OnDemandASG', {
vpc: vpc,
minCapacity: 2,
maxCapacity: 10,
instanceType: new ec2.InstanceType('m5.xlarge'),
machineImage: new eks.EksOptimizedImage({
kubernetesVersion: workerNodesVersion,
nodeType: eks.NodeType.STANDARD // wihtout this, incorrect SSM parameter for AMI is resolved
}),
updateType: autoscaling.UpdateType.ROLLING_UPDATE,
rollingUpdateConfiguration: {
maxBatchSize: 1,
minInstancesInService: 2,
waitOnResourceSignals: true,
pauseTime: cdk.Duration.minutes(1),
minSuccessfulInstancesPercent: 100
}
});
this.eksCluster.addAutoScalingGroup(onDemandASG, {
bootstrapEnabled: true,
mapRole: true
})
}
}
Now do the following:
- Set
workerNodesVersion
to 1.15 - Deploy the stack. It will succeed. Stack is still good.
- Set
clusterVersion
to 1.15 - Deploy the stack. An error will happen and the stack won’t be able to roll back to 1.14 since the stack is already on 1.15
Error Log
Error from cfn:
CustomResource attribute error: Vendor response doesn't contain CertificateAuthorityData key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1
CustomResource attribute error: Vendor response doesn't contain Arn key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1
CustomResource attribute error: Vendor response doesn't contain Endpoint key in object arn:aws:cloudformation:eu-central-1:XXXX:stack/XXXX/95d476f0-8e18-11ea-98f7-02433c861a1c|Cluster9EE0221C|50b09ee6-82a7-43c3-ae99-523a068c48b5 in S3 bucket cloudformation-custom-resource-storage-eucentral1
The following resource(s) failed to update: [ClusterEndpoint352C929A, ClusterCertificate0B8F68BF, ClusterArnSSM9C28FFC5].
Environment
- **CLI Version :1.36.0
- Framework Version:
- OS :
- **Language :Typescript
Other
This is 🐛 Bug Report
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Troubleshoot upgrade fails with my Amazon EKS cluster
To resolve a failed Amazon EKS cluster update, try the following: For an IpNotAvailable error, verify that the subnet that's associated with your...
Read more >Amazon EKS Kubernetes versions - AWS Documentation
As new Kubernetes versions become available in Amazon EKS, we recommend that you proactively update your clusters to use the latest available version....
Read more >Planning Kubernetes Upgrades with Amazon EKS | Containers
Amazon EKS offers highly available upgrades for your cluster control plane, managed node groups, and select operational add-ons.
Read more >Updating an Amazon EKS cluster Kubernetes version
If any of these checks fail, Amazon EKS reverts the infrastructure deployment, and your cluster remains on the prior Kubernetes version. Running applications ......
Read more >Amazon EKS platform versions
If your cluster is more than two platform versions behind the current platform version, then it's possible that Amazon EKS wasn't able to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
LOL, eventually we’ll turn this module into a decent thing.
@eladb sorry got to test this first yesterday. It worked great!!! So thank you for that, now it is possible to do a cluster upgrade without breaking the stack AND without cfn returning too early. Good work!!!