CounterEvent's synchronous design causes thread starvation under load
See original GitHub issueWhich service(blob, file, queue, table) does this issue concern?
Blob
Which version of the SDK was used?
9.3.0
Which platform are you using? (ex: .NET Core 2.1)
.NET Core 2.1
What problem was encountered?
The type CounterEvent is used in the stream implementations to wait for all pending operations to finish before returning from a flush operation.
The implementation of CounterEvent
is synchronous and based on an underlying ManualResetEvent
. Stream implementations queue up a thread pool operation to wait for the counter to reach zero. This thread pool operation gets scheduled on a dedicated thread, which then blocks for the duration of the wait:
This has turned out to be a significant scalability for us, as high concurrency quickly leads to thread pool starvation as a lot of these threads are in this waiting state for a long time.
Have you found a mitigation/solution?
In our fork, we changed the CounterEvent
implementation to use an AsyncManualResetEvent behind the scenes and to provide a WaitAsync()
methods which the stream implementations can use to wait without blocking a whole thread:
await this.noPendingWritesEvent.WaitAsync(cancellationToken);
An even cleaner alternative could be to replace the CounterEvent
completely with an AsyncCountdownEvent.
This change removed the scalability bottleneck for us and allowed us to reach almost perfect and infinite scalability in our service. Scalability increased by 20x at least (we stopped measuring) and the CPU and the capacity of the backend blob storage account are now the only limits.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top GitHub Comments
@DaRosenberg
It looks like an implementation of CounterEventAsync made it into our split library some time ago, but hasn’t been put to use in the methods you’re discussing. I’ve made some changes and am running tests now.
@DaRosenberg We do accept community contributions. Evidently, we have not been as responsive to all PRs as we ought to have, and I am very sorry for the frustration this has caused. Let me discuss with my team on Monday morning why we have not reviewed your other PR and whether we would realistically be able to review a PR you submit for this before you go through the hassle for doing so.
Thank you again for you communication and contributions and patience.