CosmosClient stops working with regenerated, hot-reloaded connection strings
See original GitHub issueDescribe the bug I’m using Azure App Configuration as a way to reload config without rebooting my application. In Cosmos case, I add the primary connection string to App Configuration, and on first boot up everything works fine. I can replace said primary connection string with the secondary, dispose the CosmosClient and recreate it and it will keep working.
The problem comes as soon as I try to use a newly regenerated connection string. Whenever a CosmosClient is instantiated with this regenerated key, every request returns a 403 error. The connection string works if I use another separate application, but in the one that got the change via hot-reload it will not work until the webapp is restarted. If we go back to the connection string we just swapped out, it will work again.
To Reproduce 1.- Instantiate a singleton CosmosClient withthe primary connection string 2.- Without rebooting the application, regenerate the secondary connection string 3.- Dispose the first CosmosClient 4.- Instantiate a new CosmosClient with the regenerated connection string WITHOUT rebooting the application (I use a combination of OptionsMonitor and AppConfiguration) 5.- All subsequent requests will fail
Expected behavior This should work without issue, since we’re creating a new CosmosClient whenever the connection string changes. If the connection string was regenerated should not matter.
Actual behavior As soon as we try to use a newly generated connection string to create a new CosmosClient, everything will stop working until a restart is made.
Environment summary SDK Version: 3.22.0 OS Version: Windows 10
Additional context Here’s how we’re handling the lifecycle of the CosmosClient:
This CosmosService is registered as singleton.
The weird thing is that the CosmosClient seems to be instantiated fine, we get no issue until we do some kind of query using the SDK. The message of the Exception that’s thrown is:
Response status code does not indicate success: Unauthorized (401); Substatus: 0; ActivityId: 346ff9e1-f42c-4faa-a35f-703a3a1c7bb7; Reason: (The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get wed, 27 oct 2021 15:50:40 gmt ' ActivityId: 346ff9e1-f42c-4faa-a35f-703a3a1c7bb7, Microsoft.Azure.Documents.Common/2.14.0, Windows/10.0.14393 cosmos-netstandard-sdk/3.23.1);
And the stack trace:
Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: Unauthorized (401); Substatus: 0; ActivityId: 346ff9e1-f42c-4faa-a35f-703a3a1c7bb7; Reason: (The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get
wed, 27 oct 2021 15:50:40 gmt
ActivityId: 346ff9e1-f42c-4faa-a35f-703a3a1c7bb7, Microsoft.Azure.Documents.Common/2.14.0, Windows/10.0.14393 cosmos-netstandard-sdk/3.23.1);
at Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request) in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\GatewayStoreClient.cs:line 124
at Microsoft.Azure.Cosmos.GatewayAccountReader.GetDatabaseAccountAsync(Uri serviceEndpoint) in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\GatewayAccountReader.cs:line 57
at Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager.GetAccountPropertiesHelper.GetAndUpdateAccountPropertiesAsync(Uri endpoint) in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\GlobalEndpointManager.cs:line 300
at Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager.GetAccountPropertiesHelper.GetAccountPropertiesAsync() in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\GlobalEndpointManager.cs:line 195
at Microsoft.Azure.Cosmos.GatewayAccountReader.InitializeReaderAsync() in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\GatewayAccountReader.cs:line 83
at Microsoft.Azure.Cosmos.CosmosAccountServiceConfiguration.InitializeAsync() in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Resource\Settings\CosmosAccountServiceConfiguration.cs:line 60
at Microsoft.Azure.Cosmos.DocumentClient.InitializeGatewayConfigurationReaderAsync() in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\DocumentClient.cs:line 6597
at Microsoft.Azure.Cosmos.DocumentClient.GetInitializationTaskAsync(IStoreClientFactory storeClientFactory) in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\DocumentClient.cs:line 958
at Microsoft.Azure.Cosmos.TaskHelper.<>c__DisplayClass0_0.<<InlineIfPossibleAsync>b__0>d.MoveNext() in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\TaskHelper.cs:line 30
--- End of stack trace from previous location ---
at Microsoft.Azure.Documents.BackoffRetryUtility``1.ExecuteRetryAsync(Func``1 callbackMethod, Func``3 callShouldRetry, Func``1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action``1 preRetryCallback)
at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
at Microsoft.Azure.Documents.BackoffRetryUtility``1.ExecuteRetryAsync(Func``1 callbackMethod, Func``3 callShouldRetry, Func``1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action``1 preRetryCallback)
at Microsoft.Azure.Cosmos.DocumentClient.EnsureValidClientAsync(ITrace trace) in C:\src\github\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\DocumentClient.cs:line 1481
--- Cosmos Diagnostics ---{"Summary":{},"name":"Typed FeedIterator ReadNextAsync","id":"a85cd026-7d9f-41b2-9084-83b4811de132","caller info":{"member":"OperationHelperWithRootTraceAsync","file":"ClientContextCore.cs","line":244},"start time":"02:02:36:852","duration in milliseconds":8.869,"data":{"Client Configuration":{"Client Created Time Utc":"2021-10-27T15:50:40.1699497Z","NumberOfClientsCreated":7,"User Agent":"cosmos-netstandard-sdk/3.22.0|3.23.1|7|X86|Microsoft Windows 10.0.14393|.NET 5.0.9|N|","ConnectionConfig":{"gw":"(cps:50, urto:10, p:False, httpf: False)","rntbd":"(cto: 5, icto: 600, mrpc: 30, mcpe: 65535, erd: False, pr: PrivatePortPool)","other":"(ed:False, be:False)"},"ConsistencyConfig":"(consistency: NotSet, prgns:[])"}},"children":[{"name":"Create Query Pipeline","id":"764d347a-3550-4f5c-9533-923fdc3860f2","caller info":{"member":"TryCreateCoreContextAsync","file":"CosmosQueryExecutionContextFactory.cs","line":85},"start time":"02:02:36:852","duration in milliseconds":8.2982,"children":[{"name":"Get Container Properties","id":"fe6057cb-02e6-454a-8cd9-ad43b7afb54f","caller info":{"member":"GetCachedContainerPropertiesAsync","file":"ClientContextCore.cs","line":391},"start time":"02:02:36:852","duration in milliseconds":8.2247,"children":[{"name":"Get Collection Cache","id":"66acd056-cfd8-4267-9497-1788b28703e3","caller info":{"member":"GetCollectionCacheAsync","file":"DocumentClient.cs","line":546},"start time":"02:02:36:852","duration in milliseconds":8.1605,"children":[{"name":"Waiting for Initialization of client to complete","id":"bf575923-99e1-48ba-8330-ae76c27b6fa2","caller info":{"member":"EnsureValidClientAsync","file":"DocumentClient.cs","line":1425},"start time":"02:02:36:852","duration in milliseconds":8.1061}]}]}]},{"name":"POCO Materialization","id":"ca967559-b22f-4ca4-9c58-3946bf213755","caller info":{"member":"ReadNextAsync","file":"FeedIteratorCore.cs","line":247},"start time":"02:02:36:861","duration in milliseconds":0.0491}]}
The way I’m doing this seems to work fine with other services, such as ServiceBus, Redis and SQL Server, so that’s why I opened the issue here.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Yes, key rotations can take a long time, that is why the documentation says the steps are:
I will try to repro this scenario:
And see what could be the issue.
Tested again, waiting more or less ten minutes between regenerating and using the new connection string and it seems to work fine. I expected for it to work as soon as the Azure portal confirms that the regeneration has been completed successfully, but it seems is not the case.
There’s still a quirk: If the CosmosClient is instantiated with the new connection string without waiting several minutes, it breaks and it will not recover, a new object needs to be created for this new connection string to work. It doesn’t matter how much we wait (we even waited for the whole weekend), that CosmosClient instance will never work.
Still, thank you for the support and sorry for the bother. Keep up the good work.