OperationCancelledException on first call(s) to Cosmos from Azure App Service
See original GitHub issueI’ve got a strange issue in an Azure App Service connecting to Cosmos DB.
When I start the App Service up afresh (after deploying or manual restart) then I’m finding that I get the following error on the first call(s) to Cosmos:
System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextFactory.TryCreateFromPartitionedQuerExecutionInfoAsync(DocumentContainer documentContainer, PartitionedQueryExecutionInfo partitionedQueryExecutionInfo, ContainerQueryProperties containerQueryProperties, CosmosQueryContext cosmosQueryContext, InputParameters inputParameters, ITrace trace, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextFactory.TryCreateCoreContextAsync(DocumentContainer documentContainer, CosmosQueryContext cosmosQueryContext, InputParameters inputParameters, ITrace trace, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.Query.Core.AsyncLazy`1.GetValueAsync(ITrace trace, CancellationToken cancellationToken) at Microsoft.Azure.Cosmos.Query.Core.Pipeline.LazyQueryPipelineStage.MoveNextAsync(ITrace trace) at Microsoft.Azure.Cosmos.Query.Core.Pipeline.NameCacheStaleRetryQueryPipelineStage.MoveNextAsync(ITrace trace) at Microsoft.Azure.Cosmos.Query.Core.Pipeline.CatchAllQueryPipelineStage.MoveNextAsync(ITrace trace) at Microsoft.Azure.Cosmos.Query.QueryIterator.ReadNextAsync(ITrace trace, CancellationToken cancellationToken) CosmosDiagnostics: {"name":"Typed FeedIterator ReadNextAsync","id":"0159ae93-5f41-4379-b2c3-44493e72af14","component":"Unknown","caller info":{"member":"ReadNextWithRootTraceAsync","file":"FeedIteratorInternal{T}.cs","line":31},"start time":"04:25:10:050","duration in milliseconds":1070.2521,"data":{},"children":[{"name":"Create Query Pipeline","id":"de8fe725-edc8-4ed1-9cd0-da2c427a3efd","component":"Query","caller info":{"member":"TryCreateCoreContextAsync","file":"CosmosQueryExecutionContextFactory.cs","line":85},"start time":"04:25:10:099","duration in milliseconds":1012.1531,"data":{},"children":[{"name":"Get Container Properties","id":"e7fd64a5-b0e9-4c38-9ea2-a19c6d00dd99","component":"Transport","caller info":{"member":"GetCachedContainerPropertiesAsync","file":"ClientContextCore.cs","line":349},"start time":"04:25:10:100","duration in milliseconds":0.5946,"data":{},"children":[{"name":"Get Collection Cache","id":"91feaf20-c06b-4c20-8f2e-babdd8fb412a","component":"Routing","caller info":{"member":"GetCollectionCacheAsync","file":"DocumentClient.cs","line":542},"start time":"04:25:10:101","duration in milliseconds":0.0054,"data":{},"children":[]}]},{"name":"Service Interop Query Plan","id":"27a117d7-724e-434b-a257-c2d3cd604672","component":"Query","caller info":{"member":"GetQueryPlanWithServiceInteropAsync","file":"QueryPlanRetriever.cs","line":58},"start time":"04:25:10:109","duration in milliseconds":992.7327,"data":{},"children":[]}]}]}
If I send me request again then it all seems fine.
At first I thought it might be a startup thing - so I tried adding some warm up code to ReadItemStreamAsync
for items that don’t exist on each Container. This didn’t work. I have noticed that I can leave the server for as long as I like before the first call and that first (or first ones) fail with the above trace. So it’s like some sort of lazy initialisation takes too long on the first call(s).
Framework version: .Net 5.0 SDK Version: 3.17.0 OS Version: Windows (Azure App Service latest) App: x64 (win-x64) Connection: Default Direct
Single Cosmos Database with AutoScale Throughput (up to 4000 RUs) 23 Shared Throughput Containers (default 400 RU base)
Any ideas what could be causing this? I’ve read the performance guide but nothing seems to apply to this problem directly. I’m not sending masses of calls though, even a single one gets this error.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
Top GitHub Comments
@andrew-tevent the reason a ReadItem call might not completely warm up all connections is because it is targeted to a particular partition, not all.
When the SDK initializes, it does a couple of things:
A ReadItem call will do all these, a second ReadItem call, will avoid the first 2, but it might be to read a different partition (based on the hash of the partition key value) so there might exist a need to open new TCP connections to a different partition. The overhead is not as big as the first ReadItem, but could be there.
Once the TCP connections are established, any request landing on the same partition pays no overhead cost.
Yes, you’re right of course - it is a replacement for calling Create on the builder (I knew that but had forgotten!).
My general comments about doing this in a controlled fashion during app startup remain though.
There’s a series of very good blog posts where Andrew Lock walks through ways to structure this within an app
Perhaps the followup post might give you ideas how to structure it too