High memory usage
See original GitHub issueDescribe the bug We have a service running on two nodes. After a load test I see a high rise of memory usage. I took a memory dump of both and see most of memory is consumed by
Task+ContingentProperties
(about 23%)Task<Object>
(about 21%)PooledTimer
(about 19%)
If I look at Retained Size I see a usage of 84% on TimerPool (502 instances).
My application is reading and writing to Azure CosmosDB in a highly parallelized way (several Parallel.ForEachAsync).
To Reproduce Sorry, but I can’t reproduce this outside of the test environment.
Expected behavior Memory consumption is lower.
Actual behavior Server memory is eaten up somewhere.
Environment summary SDK Version: 3.35.2 (Microsoft.Azure.Cosmos) OS Version (e.g. Windows, Linux, MacOSX): Windows Server 2019 Datacenter
Additional context I guess this is memory is consumed by the connection pool for cosmos db. I have configured it as
MaxTcpConnectionsPerEndpoint = 1000, // Default: 65535
IdleTcpConnectionTimeout = TimeSpan.FromHours(2), // Default: unlimited
to limit the memory usage.
Sorry if this is the wrong place for this question but I don’t know where to ask elsewhere.
Issue Analytics
- State:
- Created a month ago
- Reactions:1
- Comments:14 (6 by maintainers)
Top GitHub Comments
This is the problem, you have 333 client instances. Each client instance has an independent connection pool, that is why it’s critical to follow the Singleton pattern as our guidelines say. Connections are not shared across client instances.
The large size in PooledTimer objects is due probably to the large size in client instances.
Where are these instances created? Really only you can tell.
MaxRetryWaitTimeOnRateLimitedRequests has nothing to do, it is the setting to retry on 429s. 2 Hours for RequestTimeout is really pointless. RequestTimeout is meant to govern the max latency a TCP request can have on the wire. Normal values go from 1-60 seconds. This is supposed to mean “My application will only consider acceptable waiting for a network request up to X time, after which, consider it a timeout”, I doubt your application can wait 2 hours for a single network request? If so, then there are no users waiting for that response? https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/troubleshoot-dotnet-sdk-request-timeout?tabs=cpu-new#customize-the-timeout-on-the-azure-cosmos-db-net-sdk
Standard_B2ms has 2 cores. Based on https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/best-practice-dotnet, you probably should be using Gateway mode.
Source is: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/msdata/direct/Microsoft.Azure.Cosmos/src/direct/PooledTimer.cs