Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong user when running eks_rolling_update script on a separate cluster

See original GitHub issue

Hi!

We’ve implemented eks-rolling-update script as a separate stage in our CI (Gitlab). There are 2 clusters involved:

“gitlab-runners” cluster, where script is executed inside Gitlab runner
“dev” cluster, destination cluster that script must affect

Once script is executed inside Gitlab runner, we receive the following permissions-related error:

$ eks_rolling_update.py --cluster_name ${TF_VAR_cluster_name}
2020-12-10 13:28:28,187 INFO     Describing autoscaling groups...
2020-12-10 13:28:28,194 INFO     Pausing k8s autoscaler...
2020-12-10 13:28:28,203 INFO     Scaling of k8s autoscaler failed. Error code was Forbidden, {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps \"cluster-autoscaler\" is forbidden: User \"system:serviceaccount:eks:default\" cannot patch resource \"deployments\" in API group \"apps\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"cluster-autoscaler","group":"apps","kind":"deployments"},"code":403}
. Exiting.

That user "system:serviceaccount:eks:default" belongs to “gitlab-runners” cluster, not to “dev” (eks namespace exists only in “gitlab-runners” cluster). Moreover, if we get inside this Gitlab runner’s container and scale autoscaler manually - all works fine, deployment in “dev” cluster scales up&down: rolling-update-1 That means kubeconfig file and AWS credentials are configured properly.

Note, locally eks_rolling_update.py works fine as well (with the same variables and creds which are used in CI).

Below is our eks-rolling-upgrade stage in Gitlab CI (aws cli, kubectl and eks-rolling-update already preinstalled in image):

upgrade:
  stage: rolling-upgrade
  variables:
    AWS_DEFAULT_REGION: "eu-west-1"
    K8S_AUTOSCALER_ENABLED: "true"
    GLOBAL_MAX_RETRY: 20
    K8S_AUTOSCALER_NAMESPACE: "kube-system"
    K8S_AUTOSCALER_DEPLOYMENT: "cluster-autoscaler"
    K8S_AUTOSCALER_REPLICAS: 2
    KUBECONFIG: "/root/.kube/config"
  script:
    - source variables
    - aws eks --region eu-west-1 update-kubeconfig --name ${TF_VAR_cluster_name}
    - eks_rolling_update.py --cluster_name ${TF_VAR_cluster_name}
  when: manual
  timeout: 4h

If any additional details are needed, please let me know. Looking forward to your reply. Thanks in advance!

Version of eks-rolling-update: most recent (10-Dec-2020) version of Kubernetes: 1.18

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

withoutnicknamecommented, Dec 14, 2020

@chadlwilson thanks a lot for the hint! I’ve added automountServiceAccountToken: false to the default service account and it worked.

0reactions

chadlwilsoncommented, Dec 12, 2020

Our CI is in Kubernetes (different cluster to target) and the change to default to in-cluster config caused issues for us too. The workaround was to disable automounting the serviceaccount token in our CI agent pods, so it falls back to regular kube config/context. We had no need for that token inside our agent pods, but if you need this for some other reason, I imagine you could have issues.