question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(aws-ecs): Can't delete a stack with ASG Capacity providers

See original GitHub issue

It seem to not be possible to gracefully uninstall an ECS cluster that is associated with an ASG Capacity Provider. CF hangs and never really finishes, unless one manually deletes the ASG.

Reproduction Steps

  1. Create an ECS cluster with:
const cluster = new Cluster(this, 'EcsCluster', {
  vpc,
  clusterName: props.clusterName,
});

const autoScalingGroup = new AutoScalingGroup(this, 'Asg', {
  vpc,
  machineImage: EcsOptimizedImage.amazonLinux2(),
  instanceType: new InstanceType('t3.micro'),
  minCapacity: 1,
  maxCapacity: 100,
});

const capacityProvider = new AsgCapacityProvider(
  this,
  'AsgCapacityProvider',
  {
    autoScalingGroup,
    capacityProviderName: props.clusterName,
  },
);
cluster.addAsgCapacityProvider(capacityProvider);
  1. Uninstall the stack (I did it through the AWS Console)
  2. Wait for it…
  3. Go grab a cup of ☕
  4. Realize that the stack deletion will never finish

What did you expect to happen?

The CF stack should be properly and gracefully removed.

What actually happened?

The CF stack got stuck in DELETE_IN_PROGRESS

AWS::EC2::InternetGateway

The internetGateway 'igw-03ec296b77d21956f' has dependencies and cannot be deleted. (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: f50851df-172c-4365-b761-e6b710f5b30b; Proxy: null)

AWS::ECS::Cluster

Resource handler returned message: "Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: AmazonECS; Status Code: 400; Error Code: ClusterContainsContainerInstancesException; Request ID: 005e0a22-5547-44da-a51e-5e6b45b39b84; Proxy: null)'." (RequestToken: 50d43055-7cc8-6306-0de1-48c93e63cf96, HandlerErrorCode: GeneralServiceException)

AWS::AutoScaling::LaunchConfiguration

Cannot delete launch configuration lulz-cluster-AsgLaunchConfig6D4F96BB-15LZGM814H5M4 because it is attached to AutoScalingGroup lulz-cluster-AsgASGD1D7B4E2-R02BX4676AJJ (Service: AmazonAutoScaling; Status Code: 400; Error Code: ResourceInUse; Request ID: 3583ab1b-7c1a-47de-929a-67cb705f684f; Proxy: null)

AWS::AutoScaling::AutoScalingGroup:

Group did not stabilize. {current/minSize/maxSize} group size = {1/0/0}.

The stack finished deleting after I manually removed the ASG.

Environment

  • CDK CLI Version: 1.104.0
  • Framework Version: 1.104.0
  • Node.js Version: 14.16.1
  • OS: MacOS 10.15.7
  • Language (Version): Typescript 4.2.4

Other

This seems like a related discussion: https://github.com/aws/containers-roadmap/issues/631#issuecomment-841142816


This is 🐛 Bug Report

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
gshpychkacommented, Mar 16, 2022

@SoManyHs this is still an issue, since we have to use manual hacks to destroy the stack.

0reactions
metametadatacommented, Dec 11, 2021

@SoManyHs

Experiencing the same issue with the defaults in addAsgCapacityProvider. It was surprising as we didn’t have such issue with now deprecated addCapacity and we have no ECS tasks in ASG when we delete the stack.

  1. Feature request. Ideally, CloudFormation must not hang but fail as fast as possible with an error message about the termination protection.

  2. Documentation enhancement request. From https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html:

    By default, an Auto Scaling Group Capacity Provider will manage the Auto Scaling Group’s size for you. It will also enable managed termination protection, in order to prevent EC2 Auto Scaling from terminating EC2 instances that have tasks running on them. If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.

    It’s not fully clear from the description that the flag simply disables deletion of ASG. I got an incorrect impression that it somehow cleverly understands that there are no ECS tasks running and allows deletion in such case.

  3. Question/documentation enhancement request. We’ll likely have to set enableManagedTerminationProtection to false in our automated undeploy code. But what are the risks of turning this protection off? E.g. we don’t want ECS tasks to shut down at random times.

  4. Question/documentation enhancement request. Is it OK to set enableManagedTerminationProtection=false + enableManagedScaling=true? It seems to work but is against the documentation (“If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.”).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deleting an Auto Scaling group capacity provider using the ...
When deleting a capacity provider using the classic AWS Management Console, the console goes through two steps. The capacity provider is first disassociated ......
Read more >
Cannot delete capcity provider - Stack Overflow
The capacity provider cannot be deleted because it is associated with cluster: my-cluster. Remove the capacity provider from the cluster and ...
Read more >
Deleting an Auto Scaling group capacity provider using the ...
To delete an Auto Scaling group capacity provider (classic Amazon Web Services Management Console) · Open the Amazon ECS console at https://console.amazonaws.cn/ ...
Read more >
Should ECS/EC2 ASGProvider Capacity Provider be able to ...
Following from earlier thread https://repost.aws/questions/QU6QlY_u2VQGW658S8wVb0Cw/should-ecs-service-task-start-be-triggered-by-asg-capacity-0-1 , I've ...
Read more >
Cleanup - Amazon EC2 Spot Workshops
To delete resources like the ECS cluster and the capacity providers, we need first to make sure the resources they depend on have...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found