[QUERY] CredentialUnavailableException handling when using DefaultAzureCredential and AzureSDKs
See original GitHub issueLibrary name and version
Azure.Identity 1.4.1
Query/Question
We have infrastructure that is deployed to AKS. In order to connect our managed identity in this current subscription we assign the ManagedIdentity in our resource group to the AAD-PodIdentity system assigned managed identity. For the most part this is working well. Our service is an EventHubProcessor client that is started using Microsoft Generic Host framework and deployed to our AKS. This service also communicates with CosmosDB and Azure Storage Accounts for both the eventhubcheckpoint in blob and storage queues. The problem occurs when our application starts up and tries to authenticate to our services using the DefaultAzureCredential. It appeared as thought we fixed the issue by always “newing” up the DefaultCredential with the ManagedIdentity Client ID being passed in and all other authentication methods turned off. Unfortunately I discovered today that we started getting CredentialUnavailableExceptions for all of our services with the message “Endpoint not found”. I haven’t pinpointed exactly how this issue keeps appearing as it seems to happen immediately on startup which is ok because our istio sidecar proxy hasn’t applied the identity binding immediately. However, when this is stood up and our service is running the credential seems to lose authentication and crashes the application causing the pod to restart. The ask here is to figure out what we may need to do either in our code or with how our infrastructure is setup to get the proper credentials.
For services like the EventHubProducerClient, Storage, KeyVault we are using the AzureClientBuilder during registrations:
Func<IServiceProvider, TokenCredential> credentialFactory = (services) =>
{
var credentialFactory = services.GetRequiredService<IIdentityClientFactory>();
return credentialFactory.GetTokenCredential();
};
builder
.AddQueueServiceClient(serviceUri)
.ConfigureOptions(
opts =>
{
opts.MessageEncoding = QueueMessageEncoding.Base64;
})
.WithCredential((services) => credentialFactory.Invoke(services));
builder
.AddBlobServiceClient(blobServiceUri)
.WithCredential((services) => credentialFactory.Invoke(services));
and passed a credential factory that only does this:
public TokenCredential GetTokenCredential()
{
DefaultAzureCredentialOptions azureCredentialOptions = TokenCredentialHelper.CreateCredentialOptions(_config["ManagedIdentityClientId"]);
return new DefaultAzureCredential(azureCredentialOptions);
}
However with other services like cosmos client we are just using the credential factory and passing it to the contructor of the cosmos client like so:
var credential = _identityClientFactory.GetTokenCredential();
cosmosClient = new CosmosClient(cosmosUri, credential, _cosmosConfig.ClientOptions);
The ask here is, what is the correct way to handle fetching new tokens when getting the CredentialUnavailableException? It seems the tokens are cached when instantiating these services and most have Scoped lifetimes to help reinitialize per event received. Is there some sort of retry functionality we should be including when CredentialUnavialbleException is thrown?
Example stacktrace:
Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.\n at Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager.GetAccountPropertiesHelper.GetAccountPropertiesAsync()\n at Microsoft.Azure.Cosmos.GatewayAccountReader.InitializeReaderAsync()\n at Microsoft.Azure.Cosmos.CosmosAccountServiceConfiguration.InitializeAsync()\n at Microsoft.Azure.Cosmos.DocumentClient.InitializeGatewayConfigurationReaderAsync()\n at Microsoft.Azure.Cosmos.DocumentClient.GetInitializationTaskAsync(IStoreClientFactory storeClientFactory)\n at Microsoft.Azure.Cosmos.DocumentClient.EnsureValidClientAsync(ITrace trace)\n at Microsoft.Azure.Cosmos.DocumentClient.GetCollectionCacheAsync(ITrace trace)\n at Microsoft.Azure.Cosmos.ContainerCore.GetCachedContainerPropertiesAsync(Boolean forceRefresh, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.GetPartitionKeyDefinitionAsync(CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ExtractPartitionKeyAndProcessItemStreamAsync[T](Nullable`1 partitionKey, String itemId, T item, OperationType operationType, ItemRequestOptions requestOptions, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.CreateItemAsync[T](T item, ITrace trace, Nullable`1 partitionKey, ItemRequestOptions requestOptions, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ClientContextCore.RunWithDiagnosticsHelperAsync[TResult](ITrace trace, Func`2 task)\n at Microsoft.Azure.Cosmos.ClientContextCore.OperationHelperWithRootTraceAsync[TResult](String operationName, RequestOptions requestOptions, Func`2 task, TraceComponent traceComponent, TraceLevel traceLevel)\n at ResourceScheduler.Data.Infrastructure.Implementations.CosmosDbRepository`1.<>c__DisplayClass16_0.<<CreateAsync>b__0>d.MoveNext() in S:\\...\\CosmosDbRepository.cs:line 66\n--- End of stack trace from previous location ---\n at....Implementations.CosmosDbRepository`1.<CosmosActionWrapper>z__OriginalMethod(String id, Func`1 action)
Environment
.NET 5 Generic Hosting Framework. Deployed to AKS cluster.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Awesome. Well that solves this issue. Thanks so much for your help!
In version 1.5.0, the retries are handled automatically by
DefaultAzureCredential
so you shouldn’t need to retry in your own code. Previously we just tried to make a TCP connection to the endpoint and failed if it did not connect on the first try after less than a second. The new code uses a default Retry policy.