question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CounterEvent's synchronous design causes thread starvation under load

See original GitHub issue

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the SDK was used?

9.3.0

Which platform are you using? (ex: .NET Core 2.1)

.NET Core 2.1

What problem was encountered?

The type CounterEvent is used in the stream implementations to wait for all pending operations to finish before returning from a flush operation.

The implementation of CounterEvent is synchronous and based on an underlying ManualResetEvent. Stream implementations queue up a thread pool operation to wait for the counter to reach zero. This thread pool operation gets scheduled on a dedicated thread, which then blocks for the duration of the wait:

https://github.com/Azure/azure-storage-net/blob/38425e715e1bcdb4cab344bcb9b448c08bf8af5c/Lib/WindowsRuntime/Blob/BlobWriteStream.cs#L192

This has turned out to be a significant scalability for us, as high concurrency quickly leads to thread pool starvation as a lot of these threads are in this waiting state for a long time.

Have you found a mitigation/solution?

In our fork, we changed the CounterEvent implementation to use an AsyncManualResetEvent behind the scenes and to provide a WaitAsync() methods which the stream implementations can use to wait without blocking a whole thread:

await this.noPendingWritesEvent.WaitAsync(cancellationToken);

An even cleaner alternative could be to replace the CounterEvent completely with an AsyncCountdownEvent.

This change removed the scalability bottleneck for us and allowed us to reach almost perfect and infinite scalability in our service. Scalability increased by 20x at least (we stopped measuring) and the CPU and the capacity of the backend blob storage account are now the only limits.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kfarmer-msftcommented, Nov 29, 2018

@DaRosenberg

It looks like an implementation of CounterEventAsync made it into our split library some time ago, but hasn’t been put to use in the methods you’re discussing. I’ve made some changes and am running tests now.

1reaction
rickle-msftcommented, Aug 18, 2018

@DaRosenberg We do accept community contributions. Evidently, we have not been as responsive to all PRs as we ought to have, and I am very sorry for the frustration this has caused. Let me discuss with my team on Monday morning why we have not reviewed your other PR and whether we would realistically be able to review a PR you submit for this before you go through the hassle for doing so.

Thank you again for you communication and contributions and patience.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to troubleshoot thread starvation in ASP.NET Core on ...
This is particularly true if the work you need to do is synchronous or CPU-bound. Async can at least free up threads during...
Read more >
Debug ThreadPool Starvation
ThreadPool starvation occurs when the pool has no available threads to process new work items and it often causes applications to respond ...
Read more >
NET ThreadPool starvation, and how queuing makes it worse
You start an asynchronous operation ( DoSomethingAsync ) then block the current thread. At some point, the asynchronous operation will need a thread...
Read more >
Starvation and Tuning · Cats Effect
Similarly, CPU starvation can be caused by issues in your own application – such as hard-blocking, or compute-bound tasks that hog the thread...
Read more >
Troubleshooting Thread Starvation in ASP.NET Core on ...
Thread starvation occurs when a thread is unable to gain access to shared resources, causing the application to hang or slow down. This...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found