(aws-ecs): hanging on deleting a stack with ASG capacity provider
See original GitHub issueWhat is the problem?
The deletion of stack with AsgCapacityProvider
hangs unexpectedly.
It is surprising as we didn’t have such an issue with now deprecated addCapacity
and we have no ECS tasks in ASG when we delete the stack.
The behaviour seems to be caused by the default enableManagedTerminationProtection = true
.
See the discussion in the original closed issue and my unaddressed comment: https://github.com/aws/aws-cdk/issues/14732#issuecomment-991402770.
Reproduction Steps
Please see https://github.com/aws/aws-cdk/issues/14732.
In short, try to delete the stack with ECS cluster which uses AsgCapacityProvider
defaults.
What did you expect to happen?
Either:
- CloudFormation does not hang but fails as fast as possible with an error message about the termination protection.
- The stack is successfully deleted as there are no running ECS tasks anymore.
What actually happened?
The CF stack got stuck in DELETE_IN_PROGRESS.
CDK CLI Version
2.3.0
Framework Version
2.3.0
Node.js Version
v16.8.0
OS
macOS
Language
Java
Language Version
11.0.8
Other information
Workaround
My current workaround: set AsgCapacityProvider
enableManagedTerminationProtection = false
.
Documentation questions/enhancement requests
From https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html (emphasis mine):
By default, an Auto Scaling Group Capacity Provider will manage the Auto Scaling Group’s size for you. It will also enable managed termination protection, in order to prevent EC2 Auto Scaling from terminating EC2 instances that have tasks running on them. If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.
- It’s not fully clear from the description that the flag simply disables deletion of ASG. I got an incorrect impression that it somehow cleverly understands that there are no ECS tasks running and allows deletion in such case.
- What are the risks of turning this protection off? E.g. we don’t want ECS tasks to shut down at random times.
- Is it OK to set
enableManagedTerminationProtection=false
+enableManagedScaling=true
? It seems to work but is against the documentation (“If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.”).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:12 (3 by maintainers)
Top GitHub Comments
Hi everyone, we have the same issue, not just when deleting a cluster, but when trying to update the AMI ID used for the cluster. Updating the MachineImage in the ASG, leads to a new LaunchConfiguration and therefore a new autoscaling group. Is there any way arround this? Or do we have to write a custom resource to enable and disable termination protection on demand?
The solution suggested by @gshpychka works great for us. In our case, we were experiencing the same problem, not with a capacity provider but with a custom termination policy lambda.
Normally, the CDK wants to delete the ASG, which triggers a scale-in that waits for instances to terminate, but while that happens the CDK is dismantling the roles and permissions of the custom termination policy lambda, so it can no longer tell the ASG that any instances are safe to terminate.
In this case you can create the custom resource, then make it depend on the ASG. That forces your CR to be deleted before the ASG, which force-deletes the ASG, preventing it from calling the custom termination policy.