question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Client command dispatcher thread dead lock

See original GitHub issue

Hi!

Symptoms

Since the last week one of our services began to stop processing messages from RabbitMQ in random times. It doesn’t consume CPU or memory when this happens and the connection between the service and RabbitMQ is alive but the service doesn’t consume new messages. It happens every day on all replicas of the service and it is very important to us to fix the issue because this service is essential for our product.

For the first few times we just restarted the service but the issue continues to appear. At the last time we took a process dump and it had a surprise: Client Command Dispatcher Thread executes our code for processing the message. Our code for processing sends another messages and in the dump the Client Command Dispatcher Thread was locked at RabbitBus.PublishAsync method trying to add an action to the queue inside ClientCommandDispatcherSingleton. However, the Client Command Dispatcher Thread should take items from this queue and this is how we get a deadlock.

Exact cause

I believe we found the exact cause of the problem. ClientCommandDispatcherSingleton.IvokeAsync calls Task.TrySetResultSafe and sometimes continuations are executed synchronously as part of this call. This is a known trap of the TPL and as I understand by existing of TaskHelpers.TrySetResultSafe you’re perfectly aware of this.

I understand that TaskHelpers.TrySetResultSafe tries to avoid hijacking of the thread by calling Task.Run. However, Task.Run doesn’t guarantee that the code will be executed on a different thread and I believe we get a deadlock exactly when this Task.Run executes synchronously.

Reproducing

I created minimal console app which reproduces the problem. The algorithm is:

  • Subscribe the consumer via SubscribeAsync
  • Publish 10k messages in the main thread
  • The consumer sends three another messages via await PublishAsync
  • Main thread sleeps for a minute giving EasyNetQ time for consuming the messages
  • Usually within 15 seconds one of the PublishAsync from the consumer will be executed on the Client Command Dispatcher Thread and processing of new messages will freeze.

Obviosly one of the continuations of await PublishAsync inside the consumer was executed synchronously. This is a stack trace of the Client Command Dispatcher Thread from the dump I created when the app freezed:

ntdll.dll!776a90bc()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
KERNELBASE.dll!74ae1556()
mscorlib.ni.dll!728c5fef()
[Managed to Native Transition]
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext)
mscorlib.dll!System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(int millisecondsTimeout, uint startTime, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.SemaphoreSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
System.dll!System.Collections.Concurrent.BlockingCollection<System.Action>.TryAddWithNoTimeValidation(System.Action item, int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Func<RabbitMQ.Client.IModel, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcher.InvokeAsync(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<EasyNetQ.RabbitAdvancedBus.<PublishAsync>d__27>(ref EasyNetQ.RabbitAdvancedBus.<PublishAsync>d__27 stateMachine)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync<ReproduceThreadSteale.TestMessage>(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.IMessage<ReproduceThreadSteale.TestMessage> message)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<EasyNetQ.RabbitBus.<PublishAsync>d__13<ReproduceThreadSteale.TestMessage>>(ref EasyNetQ.RabbitBus.<PublishAsync>d__13<ReproduceThreadSteale.TestMessage> stateMachine)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, string topic)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message)
ReproduceThreadSteale.exe!ReproduceThreadSteale.Program.Consume(ReproduceThreadSteale.TestMessage msg) Line 41
kernel32.dll!76723744()
ntdll.dll!77699e54()
ntdll.dll!77699e1f()
[Resuming Async Method]
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.SetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult()
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.SetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult()
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Internals.TaskHelpers.TrySetResultSafe<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> source, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__1(RabbitMQ.Client.IModel channel)
EasyNetQ.dll!EasyNetQ.Producer.PersistentChannel.InvokeChannelAction(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__0()
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.StartDispatcherThread.AnonymousMethod__10_0()
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart()
[Native to Managed Transition]

At the bottom of the stack trace we see a synchronous call of continuations from TaskCompletionSource.TrySetResult which leads us to RabbitAdvancedBus.PublishAsync and finnaly to the Program.Consume method:

...
ReproduceThreadSteale.exe!ReproduceThreadSteale.Program.Consume(ReproduceThreadSteale.TestMessage msg) Line 41
...
    EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Internals.TaskHelpers.TrySetResultSafe<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> source, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__1(RabbitMQ.Client.IModel channel)
EasyNetQ.dll!EasyNetQ.Producer.PersistentChannel.InvokeChannelAction(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__0()
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.StartDispatcherThread.AnonymousMethod__10_0()
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
...

Fix

I can think about three possible ways to fix:

  1. Do not call TrySetResult on the Client Command Dispatched Thread
  2. Call TrySetResult on the Client Command Dispatched Thread but ensure that continuations will be executed on another threads
  3. Replace Client Command Dispatched Thread with critical section around persistentChannel

The first way requires another queue for storing results and another thread that will consume this queue. If this queue will not have capacity limit then nothing could be blocked on .Add and deadlock will not happen.

The second way I took from Stack Overflow: http://stackoverflow.com/a/22588431/458723. It requires a very dirty hack with cancelling a Task and it is just too dirty for me 😃 However, on .NET 4.6 we have TaskCreationOptions.RunContinuationsAsynchronously which is the best way for me but we still need to fix the issue for EasyNetQ users on .NET 4.5.

The third way is super easy to implement with SemaphoreSlim.WaitAsync. I implemented this and currently we switched to our custom build of EasyNetQ with this fix because it is super critical to us to avoid freezes. I’m afraid that there was a special reason why the Client Command Dispatched Thread was implemented instead of critical section which I can’t understand.

Sorry for long report 😃

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:7
  • Comments:18 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
harrisonmeistercommented, Jul 11, 2019

https://github.com/EasyNetQ/EasyNetQ/pull/967 should fix this, at least for consumers of easynetq who are using framework of net46+

1reaction
harrisonmeistercommented, Jul 11, 2019

Thanks @flyingpie! I’ll take a look at both options

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deadlock when thread uses dispatcher and the main ...
This deadlock happens because the UI thread is waiting for the background thread to finish, and the background thread is waiting for the...
Read more >
Debugging a Deadlock - Windows drivers
A deadlock arises when two or more threads have requested locks on two or more resources, in an incompatible sequence. For instance, suppose ......
Read more >
Deadlock in Single Threaded Java Application
Deadlock describes a situation where two or more threads are blocked forever because they are waiting for each other.
Read more >
C# Deadlocks in Depth – Part 2
The Dispatcher-Queue Deadlock. One of the most common ways we use Threads (well, asynchronous programming) is so as not to block the UI...
Read more >
Creating and Analyzing Thread Dumps
BLOCKED : A thread is in the blocked state when it tries to access an object that is currently used (locked) by some...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found