Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(aws-ecs): hanging on deleting a stack with ASG capacity provider

See original GitHub issue

What is the problem?

The deletion of stack with AsgCapacityProvider hangs unexpectedly.

It is surprising as we didn’t have such an issue with now deprecated addCapacity and we have no ECS tasks in ASG when we delete the stack.

The behaviour seems to be caused by the default enableManagedTerminationProtection = true.

See the discussion in the original closed issue and my unaddressed comment: https://github.com/aws/aws-cdk/issues/14732#issuecomment-991402770.

Reproduction Steps

Please see https://github.com/aws/aws-cdk/issues/14732.

In short, try to delete the stack with ECS cluster which uses AsgCapacityProvider defaults.

What did you expect to happen?

Either:

CloudFormation does not hang but fails as fast as possible with an error message about the termination protection.
The stack is successfully deleted as there are no running ECS tasks anymore.

What actually happened?

The CF stack got stuck in DELETE_IN_PROGRESS.

CDK CLI Version

2.3.0

Framework Version

2.3.0

Node.js Version

v16.8.0

OS

macOS

Language

Java

Language Version

11.0.8

Other information

Workaround

My current workaround: set AsgCapacityProvider enableManagedTerminationProtection = false.

Documentation questions/enhancement requests

From https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html (emphasis mine):

By default, an Auto Scaling Group Capacity Provider will manage the Auto Scaling Group’s size for you. It will also enable managed termination protection, in order to prevent EC2 Auto Scaling from terminating EC2 instances that have tasks running on them. If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.

It’s not fully clear from the description that the flag simply disables deletion of ASG. I got an incorrect impression that it somehow cleverly understands that there are no ECS tasks running and allows deletion in such case.
What are the risks of turning this protection off? E.g. we don’t want ECS tasks to shut down at random times.
Is it OK to set enableManagedTerminationProtection=false + enableManagedScaling=true? It seems to work but is against the documentation (“If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.”).

Issue Analytics

State:
Created 2 years ago
Reactions:7
Comments:12 (3 by maintainers)

Top GitHub Comments

3reactions

fschollmeyercommented, Feb 14, 2022

Hi everyone, we have the same issue, not just when deleting a cluster, but when trying to update the AMI ID used for the cluster. Updating the MachineImage in the ASG, leads to a new LaunchConfiguration and therefore a new autoscaling group. Is there any way arround this? Or do we have to write a custom resource to enable and disable termination protection on demand?

1reaction

elliot-nelsoncommented, Jun 9, 2022

The solution suggested by @gshpychka works great for us. In our case, we were experiencing the same problem, not with a capacity provider but with a custom termination policy lambda.

Normally, the CDK wants to delete the ASG, which triggers a scale-in that waits for instances to terminate, but while that happens the CDK is dismantling the roles and permissions of the custom termination policy lambda, so it can no longer tell the ASG that any instances are safe to terminate.

In this case you can create the custom resource, then make it depend on the ASG. That forces your CR to be deleted before the ASG, which force-deletes the ASG, preventing it from calling the custom termination policy.

    const asgForceDelete = new cr.AwsCustomResource(this, 'AsgForceDelete', {
      onDelete: {
        service: 'AutoScaling',
        action: 'deleteAutoScalingGroup',
        parameters: {
          AutoScalingGroupName: this.autoScalingGroup.autoScalingGroupName,
          ForceDelete: true
        }
      },
      policy: cr.AwsCustomResourcePolicy.fromSdkCalls({
        resources: cr.AwsCustomResourcePolicy.ANY_RESOURCE
      })
    });
    asgForceDelete.node.addDependency(this.autoScalingGroup);

Top Results From Across the Web

Deleting an Auto Scaling group capacity provider using the ...

To delete an Auto Scaling group capacity provider (classic AWS Management Console) · From the navigation bar, select the Region your cluster exists...

Deleting an Auto Scaling group capacity provider using the ...

Once deleted, the Auto Scaling group capacity provider will transition to the INACTIVE state. Capacity providers with an INACTIVE status may remain discoverable ......

capacity provider strategy cloudformation - Canalmarket Medical

(aws-ecs): Can't delete a stack with ASG Capacity providers, (aws-ecs-patterns): Add capacity_provider_strategies to QueueProcessingFargateService, ...

Cleanup - Amazon EC2 Spot Workshops

To delete resources like the ECS cluster and the capacity providers, ... --auto-scaling-group-name EcsSpotWorkshop-ASG-OD while [ 1 -ne $(aws ecs ...

aws.asg — Cloud Custodian documentation

Filter returns ASG that have less instances than desired or required. example. policies: - name: asg-capacity-delta resource: asg filters: - capacity-delta.