CosmosDb client throwing timeout Exception
See original GitHub issueWe are using “Microsoft.Azure.Cosmos” Version=“3.23.0” in Net Core application. After migrating to the newer client SDK, we are seeing intermittent failures from Cosmos. To Create CosmosClient and container we are using default values for the time out and not specifying any custom value to override it. here is the stack trace of Exception
`
<html> <body>The operation was canceled.CosmosDiagnostics: {“Summary”:{“DirectCalls”:{“(200, 0)”:1,“(204, 0)”:20},“RegionsContacted”:1},“name”:“ReplaceItemAsync”,“id”:“fcafa2a7-6c50-4de5-86e4-a89ea4e0d0aa”,“caller info”:{“member”:“OperationHelperWithRootTraceAsync”,“file”:“ClientContextCore.cs”,“line”:244},“start time”:“01:03:27:626”,“duration in milliseconds”:1907.6462,“data”:{“Client Configuration”:{“Client Created Time Utc”:“2022-02-02T03:37:50.3122530Z”,“NumberOfClientsCreated”:1,“User Agent”:“cosmos-netstandard-sdk/3.23.0|3.23.1|1|X64|Linux 5.4.0-1064-azure 67 18.|.NET Core 3.1.22|L|”,“ConnectionConfig”:{“gw”:“(cps:50, urto:10, p:False, httpf: False)”,“rntbd”:“(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)”,“other”:“(ed:False, be:False)”},“ConsistencyConfig”:“(consistency: NotSet, prgns:[westus])”},“Operation Cancelled Exception”:“System.OperationCanceledException: The operation was canceled.\n at System.Threading.CancellationToken.ThrowOperationCanceledException()\n at Microsoft.Azure.Documents.TimeoutHelper.ThrowGoneIfElapsed()\n at Microsoft.Azure.Documents.StoreReader.ReadMultipleReplicaAsync(DocumentServiceRequest entity, Boolean includePrimary, Int32 replicaCountToRead, Boolean requiresValidLsn, Boolean useSessionToken, ReadMode readMode, Boolean checkMinLSN, Boolean forceReadAll)\n at Microsoft.Azure.Documents.ConsistencyWriter.WaitForWriteBarrierAsync(DocumentServiceRequest barrierRequest, Int64 selectedGlobalCommittedLsn)\n at Microsoft.Azure.Documents.ConsistencyWriter.WritePrivateAsync(DocumentServiceRequest request, TimeoutHelper timeout, Boolean forceRefresh)\n at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync(Func 1 callbackMethod, Func3 callShouldRetry, Func 1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback)\n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\n at Microsoft.Azure.Documents.BackoffRetryUtility 1.ExecuteRetryAsync(Func1 callbackMethod, Func 3 callShouldRetry, Func1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action 1 preRetryCallback)\n at Microsoft.Azure.Documents.ConsistencyWriter.WriteAsync(DocumentServiceRequest entity, TimeoutHelper timeout, Boolean forceRefresh, CancellationToken cancellationToken)\n at Microsoft.Azure.Documents.ReplicatedResourceClient.<>c__DisplayClass31_0.<<InvokeAsync>b__0>d.MoveNext()\n— End of stack trace from previous location where exception was thrown —\n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func1 executeAsync, Func 1 prepareRequest, IRequestRetryPolicy2 policy, CancellationToken cancellationToken, Func 1 inBackoffAlternateCallbackMethod, Nullable1 minBackoffForInBackoffCallback)\n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func 1 executeAsync, Func1 prepareRequest, IRequestRetryPolicy 2 policy, CancellationToken cancellationToken, Func1 inBackoffAlternateCallbackMethod, Nullable 1 minBackoffForInBackoffCallback)\n at Microsoft.Azure.Documents.StoreClient.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken, IRetryPolicy retryPolicy, Func2 prepareRequestAsyncDelegate)\n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RouterHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.ExecuteHttpRequestAsync(Func 1 callbackMethod, Func3 callShouldRetry, Func 3 callShouldRetryException, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Team.Services.Common.CosmosDb.Metrics.CosmosDbMetricsRequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(String resourceUriString, ResourceType resourceType, OperationType operationType, RequestOptions requestOptions, ContainerInternal cosmosContainerCore, FeedRange feedRange, Stream streamPayload, Action1 requestEnricher, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ProcessItemStreamAsync(Nullable 1 partitionKey, String itemId, Stream streamPayload, OperationType operationType, ItemRequestOptions requestOptions, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ExtractPartitionKeyAndProcessItemStreamAsync[T](Nullable1 partitionKey, String itemId, T item, OperationType operationType, ItemRequestOptions requestOptions, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ReplaceItemAsync[T](T item, String id, ITrace trace, Nullable 1 partitionKey, ItemRequestOptions requestOptions, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ClientContextCore.RunWithDiagnosticsHelperAsync[TResult](ITrace trace, Func`2 task)”},“children”:[{“name”:“ItemSerialize”,“id”:“f1df4548-aca2-4c4c-b3d1-628be7929927”,“caller info”:{“member”:“ExtractPartitionKeyAndProcessItemStreamAsync”,“file”:“ContainerCore.Items.cs”,“line”:941},“start time”:“01:03:27:626”,“duration in milliseconds”:0.037},{“name”:“Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler”,“id”:“297adf96-e2e9-469b-8377-b9112b372d58”,“start time”:“01:03:27:626”,“duration in milliseconds”:1907.3794,“children”:[{“name”:“Team.Services.Common.CosmosDb.Metrics.CosmosDbMetricsRequestHandler”,“id”:“cde494c7-1337-410b-ab0f-d1d134bff1d0”,“start time”:“01:03:27:626”,“duration in milliseconds”:1907.2991,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler”,“id”:“fa46111b-3207-40f4-83f2-8df7f6c51000”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.8139,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.RetryHandler”,“id”:“ea078910-bd49-43d2-a2e5-bcf1f75eaa20”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.7529,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.RouterHandler”,“id”:“b97c43e4-7dd0-425b-ad9b-063d22c3f2e9”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.6648,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.TransportHandler”,“id”:“9eb400c9-7db3-42f0-a6ca-04b8ea09d665”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.6244,“children”:[{“name”:“Microsoft.Azure.Documents.ServerStoreModel Transport Request”,“id”:“d0dd59bc-daf8-4277-859d-a006f2c8b3fa”,“caller info”:{“member”:“ProcessMessageAsync”,“file”:“TransportHandler.cs”,“line”:109},“start time”:“01:03:27:626”,“duration in milliseconds”:1906.4729,“data”:{“Client Side Request Stats”:{“Id”:“AggregatedClientSideRequestStatistics”,“ContactedReplicas”:[{“Count”:1,<Removed for security>],“FailedReplicas”:[],“AddressResolutionStatistics”:[],“StoreResponseStatistics”:[{“ResponseTimeUTC”:“2022-02-03T01:03:27.6953683Z”,“ResourceType”:“Document”,“OperationType”:“Replace”,“LocationEndpoint”:“endpoint/”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“Ok”,“SubStatusCode”:“Unknown”,“LSN”:9692316,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:-1,“UsingLocalLSN”:false,“QuorumAckedLSN”:9692315,“SessionToken”:“0#9692316#7=-1”,“CurrentWriteQuorum”:3,“CurrentReplicaSetSize”:4,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:“rntbd://:14068/apps/ba19ffe5-e50d-4d28-9b5d-c069d0e15f9a/services/f707bbef-ed4b-4815-82f7-ccd9e8b7f212/partitions/b691ef48-742a-4c3d-b1c5-1117beeb0a6d/replicas/132876734229942716p/”,“RequestCharge”:16,“RetryAfterInMs”:null,“BELatencyInMs”:“5.287”,“transportRequestTimeline”:{“requestTimeline”:[{“event”: “Created”, “startTimeUtc”: “2022-02-03T01:03:27.6264893Z”, “durationInMs”: 0.0027},{“event”: “ChannelAcquisitionStarted”, “startTimeUtc”: “2022-02-03T01:03:27.6264920Z”, “durationInMs”: 0.009},{“event”: “Pipelined”, “startTimeUtc”: “2022-02-03T01:03:27.6265010Z”, “durationInMs”: 0.122},{“event”: “Transit Time”, “startTimeUtc”: “2022-02-03T01:03:27.6266230Z”, “durationInMs”: 68.6575},{“event”: “Received”, “startTimeUtc”: “2022-02-03T01:03:27.6952805Z”, “durationInMs”: 0.0698},{“event”: “Completed”, “startTimeUtc”: “2022-02-03T01:03:27.6953503Z”, “durationInMs”: 0}],“requestSizeInBytes”:2902,“requestBodySizeInBytes”:851,“responseMetadataSizeInBytes”:500},“TransportException”:null}},{“ResponseTimeUTC”:“2022-02-03T01:03:27.7595265Z”,“ResourceType”:“Collection”,“OperationType”:“Head”,“LocationEndpoint”:“https://”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“NoContent”,“SubStatusCode”:“Unknown”,“LSN”:9692310,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:8971340,“UsingLocalLSN”:false,“QuorumAckedLSN”:-1,“SessionToken”:“0#9692310#7=-1”,“CurrentWriteQuorum”:-1,“CurrentReplicaSetSize”:-1,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:“rntbd://:/apps//services/f707bbef-ed4b-4815-82f7-ccd9e8b7f212/partitions/b691ef48-742a-4c3d-b1c5-1117beeb0a6d/replicas/132860494913145228s/”,“RequestCharge”:0,“RetryAfterInMs”:null,“BELatencyInMs”:“0.451”,“transportRequestTimeline”:{“requestTimeline”:[{“event”: “Created”, “startTimeUtc”: “2022-02-03T01:03:27.6956262Z”, “durationInMs”: 0.0027},{“event”: “ChannelAcquisitionStarted”, “startTimeUtc”: “2022-02-03T01:03:27.6956289Z”, “durationInMs”: 0.0026},{“event”: “Pipelined”, “startTimeUtc”: “2022-02-03T01:03:27.6956315Z”, “durationInMs”: 0.1074},{“event”: “Transit Time”, “startTimeUtc”: “2022-02-03T01:03:27.6957389Z”, “durationInMs”: 63.7201},{“event”: “Received”, “startTimeUtc”: “2022-02-03T01:03:27.7594590Z”, “durationInMs”: 0.0621},{“event”: “Completed”, “startTimeUtc”: “2022-02-03T01:03:27.7595211Z”, “durationInMs”: 0}],“requestSizeInBytes”:1881,“responseMetadataSizeInBytes”:298},“TransportException”:null}},{“ResponseTimeUTC”:“2022-02-03T01:03:27.8362170Z”,“ResourceType”:“Collection”,“OperationType”:“Head”,“LocationEndpoint”:“/”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“NoContent”,“SubStatusCode”:“Unknown”,“LSN”:9692316,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:8971340,“UsingLocalLSN”:false,“QuorumAckedLSN”:-1,“SessionToken”:“0#9692316#7=-1”,“CurrentWriteQuorum”:-1,“CurrentReplicaSetSize”:-1,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:"rntbd://c/apps/ba |
---|
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (9 by maintainers)
Top Results From Across the Web
Troubleshoot Azure Cosmos DB HTTP 408 or request ...
APPLIES TO: NoSQL. The HTTP 408 error occurs if the SDK was unable to complete the request before the timeout limit occurred.
Read more >Request Timeout in Azure Cosmos DB in sdk v3
I am inserting the data to azure cosmos db. In some time it throws an error (Request Timeout : 408). I have increased...
Read more >Cosmos write timeout exception causing high latency #3798
I get operation timeout errors (408) on writing to cosmosdb using ... Throwing exception explicity as stream API only throws client side ...
Read more >Azure Cosmos DB Troubleshoot Service Availability 503 ...
Learn how to troubleshoot service availability 503 and timeout 408 issues connectivity issues with Azure Cosmos DB.
Read more >Azure Cosmos DB 4xx Status Codes
Request Timeout means the request did not complete in time. This can happen when a stored procedure, trigger, or UDF within a query...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The CancellationToken is stopping the quorum verification for Strong/Bounded Staleness, that means that quorum, for some reason, is not being achieved within those 2 seconds. The 20 Collection HEAD requests mean that the SDK checked 20 times for quorum until the token cancelled. It would be good to investigate why is it taking that long for quorum. Increasing the token time would certainly help with mitigation.
First step would be to increase the CancellationToken time and see if that impacts. Afterwards, if the issue persists, filing a support ticket should help the service team investigate what could be the reason. For the service ticket it would be ideal to have the full exception if possible (no truncation).