question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] ManagedIdentityCredential fails sometimes in AKS

See original GitHub issue

Describe the bug We are using the ManagedIdentityCredential class to get access tokens for managed identities. We are deploying the application to AKS, where we are using https://github.com/Azure/aad-pod-identity.

The instance of the ManagedIdentityCredential is a singleton.

Sometimes after starting a new pod, we get the exception Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found. everytime when the pod is trying to get the access token. If the pod is able to start without this exception, the exception is never observed during the lifetime of the pod.

The problem seems to be that there might be a delay after starting up the pod, after which the IMDS endpoint is available for the pod in AKS. When the pod is trying to get the access token before the endpoint is available, it has some bad state, where it will never be able to recover from.

The cause of the issue is probably this code: https://github.com/Azure/azure-sdk-for-net/blob/705f3296529b2e30e16bfe42fdd2245511a5c0b0/sdk/identity/Azure.Identity/src/ManagedIdentityClient.cs#L51-L67

The ManagedIdentityClient tries several strategies to get a ManagedIdentitySource. If all of them fail, it sets the value of _identitySourceAsyncLock to null and will therefore never try to resolve the ManagedIdentitySource again.

Expected behavior The exception should not occur or it should be possible to recover from this failed state when the IMDS endpoint gets available.

Actual behavior The exception occurs and it is not possible to recover from the failed state.

To Reproduce

  1. Deploy AKS
  2. Deploy the aad-pod-identity services
  3. Deploy a pod using the ManagedIdentityCredentials
  4. Pray for observing the exception
  5. Observe the exception

Environment:

  • AKS 1.17.11
  • AAD Pod Identity 1.7.1
  • Azure.Identity 1.2.3

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
PSanetracommented, Feb 1, 2021

@christothes The scenario I had in mind were using DefaultAzureCredential.

So when I would try the get an access token via an DefaultAzureCredential it would be tried to get a token via, EnvironmentCredential, ManagedIdentityCredential, SharedTokenCredential, and so on.

https://github.com/Azure/azure-sdk-for-net/blob/a79bd104ab8c2cf79e0f9b2c60619de85f101d71/sdk/identity/Azure.Identity/src/DefaultAzureCredential.cs#L169-L220

When you are now introducing a retry mechanism in the ImdsManagedIdentitySource, the ManagedIdentityCredential.GetTokenAsync() method would block until the timeouts and retries of ImdsManagedIdentitySource are exhausted. And this would always happen, even if the IMDS endpoint will never be available, which is quite often the case when using DefaultAzureCredential.

0reactions
deyanpcommented, Sep 27, 2021

Please note that the initContainers sample

https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/identity/azure-identity/tests/pod-identity/test-pod-identity/templates/job.yaml#L23

must be extended with

&mi_res_id=pod-identity-test

otherwise it will check for any managed identity (there might be others on VM) …

Read more comments on GitHub >

github_iconTop Results From Across the Web

DefaultAzureCredentials fails to pickup credentials in ...
I have deployed my AKS cluster using the System managed identity for node-pool.I gave permissions for the system managed identity to access ...
Read more >
Fix security and identity known issues and errors in AKS ...
Use this topic to help you troubleshoot and resolve security and identity-related issues in AKS hybrid. Get-AksHciCredential fails with "cannot find the path ......
Read more >
ManagedIdentityCredential class
Creates an instance of ManagedIdentityCredential with the client ID of a user-assigned identity, or app registration (when working with AKS pod-identity).
Read more >
Refresh credentials azure powershell. Log in to the ...
Refresh credentials azure powershell. Log in to the configuration server and launch … May 31, 2022 · Microsoft is using Keychain to store...
Read more >
Azure Kubernetes Services is Stuck in Creating or Deleting ...
One of the issues we encountered was the AKS cluster sometimes stayed in a “Creating” or “Deleting” state. It happens from time to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found