Max retries exceeded during kubdernetes upgrade
See original GitHub issueDescribe the bug
This is the same symptoms as reported on #12725.
i.e. running az aks upgrade --subscription $azSub --resource-group $azRG --name $azClus --kubernetes-version 1.15.12
to upgrade from 1.13.12 to 1.15.12 results in error:
Command Name
az aks upgrade
Errors:
request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/<subscriptionid>/resourceGroups/<resourceGroup>/providers/Microsoft.ContainerService/managedClusters/<clusterName>?api-version=2020-03-01 (Caused by ResponseError('too many 500 error responses',))
To Reproduce:
Steps to reproduce the behavior. Note that argument values have been redacted, as they may contain sensitive information.
export azSub={}
export azRG={}
export azClus={}
az login
az aks get-credentials --subscription $azSub --resource-group $azRG --name $azClus
az aks show --subscription $azSub --resource-group $azRG --name $azClus --output table
az aks get-upgrades --subscription $azSub --resource-group $azRG --name $azClus --output table
az aks upgrade --subscription $azSub --resource-group $azRG --name $azClus --kubernetes-version 1.15.12
Expected Behavior
Kubernetes should be upgraded successfully / without throwing exception.
Environment Summary
Linux-4.19.104-microsoft-standard-x86_64-with-debian-buster-sid
Python 3.6.10
Installer: DEB
azure-cli 2.9.1
Note; Originally I’d tried with azure-cli 2.9.0. I then upgraded via the below steps, but the error occurred in the same session.
apt-get update
apt-get upgrade-dist
Additional Context
Yesterday I ran the same commands on the same client (only targeting a different subscription & still using with Azure CLI v2.9.0) without error.
After receiving the error I attempted the upgrade via the browser interface (i.e. went to the cluster’s page in https://portal.azure.com/, clicked Upgrade, selected v1.15.12, then clicked save. Within seconds I got the following error:
Failed to save Kubernetes service ‘cms-fr-prod-account-aks-k8s-cluster’. Error: The credentials in ServicePrincipalProfile were invalid. Please see https://aka.ms/aks-sp-help for more details. (Details: adal: Refresh request failed. Status Code = ‘401’. Response body: {“error”:“invalid_client”,“error_description”:“AADSTS7000222: The provided client secret keys are expired. Visit the Azure Portal to create new keys for your app, or consider using certificate credentials for added security: https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials\r\nTrace ID: fc15ef37-150a-460c-a67e-97f771da0d00\r\nCorrelation ID: cc52bdf6-2d93-4bdd-afc9-56965752873e\r\nTimestamp: 2020-07-17 05:56:24Z”,“error_codes”:[7000222],“timestamp”:“2020-07-17 05:56:24Z”,“trace_id”:“fc15ef37-150a-460c-a67e-97f771da0d00”,“correlation_id”:“cc52bdf6-2d93-4bdd-afc9-56965752873e”,“error_uri”:“https://login.microsoftonline.com/error?code=7000222”})
I have both Contributor
and Reader
roles assigned at subscription level, and as there are no deny policies removing that access, those same permissions (inherited) on the cluster. I’ve tried logging out and back in to ensure a fresh session / recent authentication. Note: We use Okta as our IdP, which uses AD to provide the credentials under the covers.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
@JohnLBevan Apologies for the late reply. Could you please let us know if you are still facing this issue ? This issue is open for quite sometime. Please do let us know if you need any assistance on this. Awaiting you reply.
Thanks for the prompt @navba-MSFT ; the issue was resolved by refreshing the service principal as noted earlier.
Happy for this ticket to be closed; though if the exception message could be improved or pre-upgrade validation checks run to help avoid this issue / flag up the underlying cause more clearly, that would be very welcome.