Response status code does not indicate success: RequestTimeout (408)
See original GitHub issueSimilar issue to this issue but in our case seems like CPU ramps up to 97% but I can’t understand why.
Our Cosmos DB is set to auto scale and we haven’t crossed 50% of the max RU consumption in the last 7 days.
The update is requested from Azure Function v4 (linux, net6.0, isolated process) on Premium Plan.
I followed this document: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-request-timeout?tabs=cpu-new#high-cpu-utilization
and cross checked all points:
- All SNAT connections were successful (latest 24h)
- We use our
CosmosContextthat inherits fromDbContext
services.AddDbContext<CosmosContext>(options =>
{
options.UseCosmos(configuration[AppSettingsKeys.CosmosDbConnection], "somenamehere");
});
which internally creates a singleton of CosmosClient
-
We are nowhere near the service limits
-
There is no HTTP proxy
This happens in a particular function that pull the document and updates nested properties. The document is around 40KB. The function has service bus trigger and retry policy:
"retry": {
"strategy": "exponentialBackoff",
"maxRetryCount": 3,
"minimumInterval": "00:00:03",
"maximumInterval": "00:00:10"
},
I have no idea what’s going on.
Here is the diagnostics registered in the exception details:
"Diagnostics":{
"name":"ReplaceItemStreamAsync",
"id":"d4330cac-9cd4-4fb9-ac70-26a0942b96a6",
"caller info":{
"member":"OperationHelperWithRootTraceAsync",
"file":"ClientContextCore.cs",
"line":244
},
"start time":"10:45:08:241",
"duration in milliseconds":12210.9945,
"data":{
"Client Configuration":{
"Client Created Time Utc":"2022-06-10T11:56:21.5647195Z",
"NumberOfClientsCreated":2,
"User Agent":"cosmos-netstandard-sdk/3.21.0|3.21.1|2|X64|Linux 5.4.0-1074-azure 77 18.|.NET 6.0.5|N| Microsoft.EntityFrameworkCore.Cosmos/6.0.5",
"ConnectionConfig":{
"gw":"(cps:50, urto:10, p:False, httpf: False)",
"rntbd":"(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)",
"other":"(ed:False, be:False)"
},
"ConsistencyConfig":"(consistency: NotSet, prgns:[])"
}
},
"children":[
{
"name":"Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
"id":"3892e9c8-a327-4ae2-a1b4-4b30b552721c",
"start time":"10:45:08:241",
"duration in milliseconds":12210.9644,
"children":[
{
"name":"Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler",
"id":"042f8751-3514-46ac-bd3b-e51ff061ac70",
"start time":"10:45:08:241",
"duration in milliseconds":12210.932,
"data":{
"System Info":{
"systemHistory":[
{
"dateUtc":"2022-06-14T10:44:12.4755898Z",
"cpu":9.907,
"memory":3178468.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":0.0213,
"availableThreads":32766,
"minThreads":2,
"maxThreads":32767
}
},
{
"dateUtc":"2022-06-14T10:44:22.4788493Z",
"cpu":4.343,
"memory":3178484.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":0.0088,
"availableThreads":32766,
"minThreads":2,
"maxThreads":32767
}
},
{
"dateUtc":"2022-06-14T10:44:39.0703495Z",
"cpu":79.250,
"memory":3484276.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":0.209,
"availableThreads":32756,
"minThreads":2,
"maxThreads":32767
}
},
{
"dateUtc":"2022-06-14T10:44:51.4720374Z",
"cpu":79.208,
"memory":2110288.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":6.154,
"availableThreads":32737,
"minThreads":2,
"maxThreads":32767
}
},
{
"dateUtc":"2022-06-14T10:45:01.5421178Z",
"cpu":82.129,
"memory":959112.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":0.3395,
"availableThreads":32732,
"minThreads":2,
"maxThreads":32767
}
},
{
"dateUtc":"2022-06-14T10:45:20.1404512Z",
"cpu":97.987,
"memory":1891392.000,
"threadInfo":{
"isThreadStarving":"False",
"threadWaitIntervalInMs":1.2721,
"availableThreads":32730,
"minThreads":2,
"maxThreads":32767
}
}
]
}
},
"children":[
{
"name":"Microsoft.Azure.Cosmos.Handlers.RetryHandler",
"id":"b87c1d09-2c23-470f-988e-70558cfcdcb5",
"start time":"10:45:08:241",
"duration in milliseconds":12210.9261,
"children":[
{
"name":"Microsoft.Azure.Cosmos.Handlers.RouterHandler",
"id":"b513f900-a379-4bfe-b5f3-9d52d15398ff",
"start time":"10:45:08:241",
"duration in milliseconds":12210.7416,
"children":[
{
"name":"Microsoft.Azure.Cosmos.Handlers.TransportHandler",
"id":"27ab336b-34d4-405d-9534-ab79980d0b29",
"start time":"10:45:08:241",
"duration in milliseconds":12210.6676,
"children":[
{
"name":"Microsoft.Azure.Documents.ServerStoreModel Transport Request",
"id":"ee060395-4562-4b8c-a6b8-c24daf7d3e45",
"caller info":{
"member":"ProcessMessageAsync",
"file":"TransportHandler.cs",
"line":109
},
"start time":"10:45:08:241",
"duration in milliseconds":12169.0857,
"data":{
"Client Side Request Stats":{
"Id":"AggregatedClientSideRequestStatistics",
"ContactedReplicas":[
{
"Count":1,
"Uri":""
},
{
"Count":1,
"Uri":""
},
{
"Count":1,
"Uri":""
}
],
"RegionsContacted":[
],
"FailedReplicas":[
],
"AddressResolutionStatistics":[
],
"StoreResponseStatistics":[
]
}
}
}
]
}
]
}
]
}
]
}
]
}
]
}
Additionally I get “ghost updates”:
product.UpdateStock(5);
await _cosmosContext.SaveChangesAsync(CancellationToken);
_logger.Information("Stock Update {@Request}", new
{
product.StockQuantity,
});
The log tells me it has updated the document: product.StockQuantity = 5 but querying the actual document reveals it is still set with the value from the previous update: product.StockQuantity = 0.
No exception is thrown related to this particular update.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (2 by maintainers)

Top Related StackOverflow Question
There are still 2 clients being created and active, is this what you expect?
This is a timeout, there are 2 potential issues:
You have high Transit Time, meaning, something is not entirely right in the network (2 seconds for a request is massive).
Very high time on Received: This means the response is sitting there waiting ~8 seconds to be consumed. This points to thread pool issues. I/O response is an async operation, this is the time before the async Task is processed, meaning that the thread-pool cannot assign a thread to continue that async Task for 8 seconds. This usually points at code in the app blocking threads (https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-slow-request?tabs=cpu-new#rntbdRequestStats), meaning that some code might not following
await/asyncand using.Result/GetAwaiter().GetResult()/etcthat might be blocking threads and preventing those threads from being used by the thread pool to resume async operations. This can also lead to high CPU usage. Useful guide: https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/blob/master/AsyncGuidance.md#avoid-using-taskresult-and-taskwaitCPU values in Linux are obtained from
/proc/stat/cpu, it’s the system wide CPU. I don’t know what those metrics in the Portal read.Transient timeouts can happen and the app should have some way to handle them: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/conceptual-resilient-sdk-applications#timeouts-and-connectivity-related-failures-http-408503
It’s when the volume affects P99 that you should investigate: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/conceptual-resilient-sdk-applications#when-to-contact-customer-support
Reference: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-request-timeout?tabs=cpu-new#troubleshooting-steps
Please update the SDK to a newer version and share the updated diagnostics. The version you are using does not include diagnostics for timeouts (added on 3.24 https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/changelog.md#-3240---2022-01-31).
The only thing we can see is that there seems to be 2 clients:
"NumberOfClientsCreated":2,We cannot tell you why your CPU is high, CPU analysis needs to be performed on the running machine.