[BUG] ManagedIdentityCredential fails sometimes in AKS
See original GitHub issueDescribe the bug
We are using the ManagedIdentityCredential
class to get access tokens for managed identities. We are deploying the application to AKS, where we are using https://github.com/Azure/aad-pod-identity.
The instance of the ManagedIdentityCredential
is a singleton.
Sometimes after starting a new pod, we get the exception Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.
everytime when the pod is trying to get the access token. If the pod is able to start without this exception, the exception is never observed during the lifetime of the pod.
The problem seems to be that there might be a delay after starting up the pod, after which the IMDS endpoint is available for the pod in AKS. When the pod is trying to get the access token before the endpoint is available, it has some bad state, where it will never be able to recover from.
The cause of the issue is probably this code: https://github.com/Azure/azure-sdk-for-net/blob/705f3296529b2e30e16bfe42fdd2245511a5c0b0/sdk/identity/Azure.Identity/src/ManagedIdentityClient.cs#L51-L67
The ManagedIdentityClient
tries several strategies to get a ManagedIdentitySource
. If all of them fail, it sets the value of _identitySourceAsyncLock
to null
and will therefore never try to resolve the ManagedIdentitySource
again.
Expected behavior The exception should not occur or it should be possible to recover from this failed state when the IMDS endpoint gets available.
Actual behavior The exception occurs and it is not possible to recover from the failed state.
To Reproduce
- Deploy AKS
- Deploy the aad-pod-identity services
- Deploy a pod using the ManagedIdentityCredentials
- Pray for observing the exception
- Observe the exception
Environment:
- AKS 1.17.11
- AAD Pod Identity 1.7.1
- Azure.Identity 1.2.3
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (7 by maintainers)
Top GitHub Comments
@christothes The scenario I had in mind were using
DefaultAzureCredential
.So when I would try the get an access token via an
DefaultAzureCredential
it would be tried to get a token via,EnvironmentCredential
,ManagedIdentityCredential
,SharedTokenCredential
, and so on.https://github.com/Azure/azure-sdk-for-net/blob/a79bd104ab8c2cf79e0f9b2c60619de85f101d71/sdk/identity/Azure.Identity/src/DefaultAzureCredential.cs#L169-L220
When you are now introducing a retry mechanism in the
ImdsManagedIdentitySource
, theManagedIdentityCredential.GetTokenAsync()
method would block until the timeouts and retries ofImdsManagedIdentitySource
are exhausted. And this would always happen, even if the IMDS endpoint will never be available, which is quite often the case when usingDefaultAzureCredential
.Please note that the initContainers sample
https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/identity/azure-identity/tests/pod-identity/test-pod-identity/templates/job.yaml#L23
must be extended with
&mi_res_id=pod-identity-test
otherwise it will check for any managed identity (there might be others on VM) …