Wrong user when running eks_rolling_update script on a separate cluster
See original GitHub issueHi!
We’ve implemented eks-rolling-update script as a separate stage in our CI (Gitlab). There are 2 clusters involved:
- “gitlab-runners” cluster, where script is executed inside Gitlab runner
- “dev” cluster, destination cluster that script must affect
Once script is executed inside Gitlab runner, we receive the following permissions-related error:
$ eks_rolling_update.py --cluster_name ${TF_VAR_cluster_name}
2020-12-10 13:28:28,187 INFO Describing autoscaling groups...
2020-12-10 13:28:28,194 INFO Pausing k8s autoscaler...
2020-12-10 13:28:28,203 INFO Scaling of k8s autoscaler failed. Error code was Forbidden, {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps \"cluster-autoscaler\" is forbidden: User \"system:serviceaccount:eks:default\" cannot patch resource \"deployments\" in API group \"apps\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"cluster-autoscaler","group":"apps","kind":"deployments"},"code":403}
. Exiting.
That user "system:serviceaccount:eks:default"
belongs to “gitlab-runners” cluster, not to “dev” (eks
namespace exists only in “gitlab-runners” cluster). Moreover, if we get inside this Gitlab runner’s container and scale autoscaler manually - all works fine, deployment in “dev” cluster scales up&down:
That means kubeconfig file and AWS credentials are configured properly.
Note, locally eks_rolling_update.py works fine as well (with the same variables and creds which are used in CI).
Below is our eks-rolling-upgrade stage in Gitlab CI (aws cli
, kubectl
and eks-rolling-update
already preinstalled in image):
upgrade:
stage: rolling-upgrade
variables:
AWS_DEFAULT_REGION: "eu-west-1"
K8S_AUTOSCALER_ENABLED: "true"
GLOBAL_MAX_RETRY: 20
K8S_AUTOSCALER_NAMESPACE: "kube-system"
K8S_AUTOSCALER_DEPLOYMENT: "cluster-autoscaler"
K8S_AUTOSCALER_REPLICAS: 2
KUBECONFIG: "/root/.kube/config"
script:
- source variables
- aws eks --region eu-west-1 update-kubeconfig --name ${TF_VAR_cluster_name}
- eks_rolling_update.py --cluster_name ${TF_VAR_cluster_name}
when: manual
timeout: 4h
If any additional details are needed, please let me know. Looking forward to your reply. Thanks in advance!
Version of eks-rolling-update: most recent (10-Dec-2020) version of Kubernetes: 1.18
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
@chadlwilson thanks a lot for the hint! I’ve added
automountServiceAccountToken: false
to the default service account and it worked.Our CI is in Kubernetes (different cluster to target) and the change to default to in-cluster config caused issues for us too. The workaround was to disable automounting the serviceaccount token in our CI agent pods, so it falls back to regular kube config/context. We had no need for that token inside our agent pods, but if you need this for some other reason, I imagine you could have issues.