"async" fails as the number of threads increases
See original GitHub issueUPDATE 03/05/2021:
A few months ago we discovered that this was somehow tied to the use of the async
keyword, and have since been able to prove it in the AcmeWebApi test application. As @davidfowl has stated in our other thread, this is most likely a race condition. Since it is virtually impossible to write any application that doesn’t use async
in some way now (due to the core library changes). it is not possible to run tests that only use synchronous code. I can say that when we did have synchronous code, we were seeing a drastically higher performance metric than we currently are seeing.
Leaving the following, as that is what we started the thread with:
We have run into several issues with ASP.NET Core that appear to be threading related. I initially created #26955, as that was the first issue that we ran into, but creating an application that can be tested is till ongoing. In the process of creating an application for that purpose, we were able to replicate another issue, which is the topic of this thread. The application linked below replicates this issue under the following conditions:
- 3,500+ concurrent clients (we are required to exceed 8,000).
- Average throughput in total across all connections is 6,000 req/s (this is our minimum for maximum throughput).
- VM has 2 dedicated CPUs and 4GB RAM.
Under these conditions we observe numerous HeartbeatSlow issues across random connections (threads), which in our full application leads to complete system failure over time. We are working on providing ways to replicate the other issues that we have observed, but these are currently the only ones that we can replicate for you in a test application.
This issue ONLY exists on Linux (we used Ubuntu 20.04.1 LTS for verification) and results in both a significant reduction of throughput and a significantly higher latency. When running in our full application, this issue, along with others, causes a complete system failure (APPCRASH) as the process runs out of memory. No matter how much we try, this issue, and the others, cannot be replicated in a Windows environment (Windows 10 and Windows Server 2019 were tested).
SDK: 3.1.301 VS: 16.8.2
The test application is available in the private repo AcmeWebApi.
FULL DISCOLSURE I work for Webroot / Carbonite / OpenText and the application discussed above is the property of said entities. Microsoft is a direct / indirect customer of ours, so I am limited on the information that I am allowed to provide.
Issue Analytics
- State:
- Created 3 years ago
- Comments:52 (27 by maintainers)
Based on https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Services/ApiService.cs#L329 and https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Handlers/TcpHandler.cs#L415-L420
You are probably filling up the thread pool since you use 3,500 simultaneous connections that are all queued and block each thread.
I don’t think it shows a bug in aspnet. Do you want to explain what you are trying to achieve to get some advice on how to approach it differently?
If the application is running out of memory, collecting a memory dump might help. Once the application starts to struggle under load, try running
dotnet-dump collect
and taking a look at it withdotnet-dump analyze
.dumpheap -stat
,clrstack -all
anddumpasync
are all interesting commands to take a look at.https://docs.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak