question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

aws-auth configmap does not get re-created after a cluster replacement, preventing nodes from joining the cluster

See original GitHub issue

During a replacement of an EKS cluster, though the replacement succeeds, the aws-auth configmap used for user / role mappings does not get recreated. This in turn prevents the new worker nodes from joining the cluster.

The aws-auth configmap gets created here. Because none of the IAM resources it depends on get replaced or updated during the cluster replacement, the aws-auth does not need to be replaced or updated either. However, during the tear down of the old cluster, the configMap goes away with the cluster, and the pulumi/kube provider does not seem to notice the need to recreate the aws-auth configMap for the new cluster.

Per discussions offline w/ Luke, the thought was that the kube provider kx-eks-cluster-eks-k8s should have been replaced instead of updated. Since the provider is the only dependency of aws-auth, if the provider were replaced, it would have created aws-auth.

Changes:
 
    Type                           Name                                   Operation
-+  aws:eks:Cluster                kx-eks-cluster-eksCluster              replaced
~   pulumi:providers:kubernetes    kx-eks-cluster-eks-k8s                 updated
~   aws:ec2:SecurityGroup          kx-eks-cluster-nodeSecurityGroup       updated
~   pulumi-nodejs:dynamic:Resource kx-eks-cluster-vpc-cni                 updated
-+  aws:ec2:LaunchConfiguration    kx-eks-cluster-nodeLaunchConfiguration replaced
~   aws:cloudformation:Stack       kx-eks-cluster-nodes                   updated
~   pulumi:providers:kubernetes    kx-eks-cluster-provider                updated
 
Resources:
    +-replaced 2
    ~ updated 5
    18 unchanged

To repro this, we’ll use the same code from https://github.com/pulumi/pulumi-eks/issues/69#issuecomment-485060221.

Steps:

  1. Download pulumi-full-aws-eks.zip
  2. Run pulumi up in the unzipped dir
  3. After initial deployment is complete, comment out line #74 subnetIds.pop(), and run another update.
  4. After about ~20 min the EKS replacement onto 3 subnets will complete
  5. kubectl cluster-info returns success
  6. kubectl get pods --all-namespaces returns core-dns Pods in Pending, as there aren’t any workers to deploy onto.
  7. kubectl get cm aws-auth -n kube-system -o yaml returns nothing
  8. kubectl get nodes -o wide --show-labels returns nothing

/cc @lukehoban @hausdorff

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
ellismgcommented, May 30, 2019

https://github.com/pulumi/pulumi/pull/2766 is now merged, So you should be able to pick up a dev build and run with it, @lblackstone. Let me know if you need help.

0reactions
metralcommented, Jun 7, 2019

Thanks to the fixes in @pulumi/pulumi v0.17.16 and @pulumi/kubernetes v0.24.0, this bug & scenario is now being added to the set of tests.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Amazon EKS troubleshooting - AWS Documentation
There are a few common reasons that prevent nodes from joining the cluster: The aws-auth-cm.yaml file doesn't have the correct IAM role ARN...
Read more >
Mistakenly updated configmap aws-auth with rbac & lost ...
Did a work-around for this issue: Since the IAM user who created the EKS Cluster by default possess complete access over the cluster, ......
Read more >
Not able to join worker nodes using kubectl with updated aws ...
When I tried to create a ConfigMap for aws-auth to join worker nodes, I gave the ARN of role/user who created the cluster...
Read more >
AWS EKS Module - Terraform Registry
Terraform module to create an Elastic Kubernetes (EKS) cluster and ... Terraform can # deduce the proper order of its creation to avoid...
Read more >
Declarative Management of Kubernetes Objects Using ...
You need to have a Kubernetes cluster, and the kubectl command-line ... Read and complete the following documents if you have not already:....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found