Client command dispatcher thread dead lock
See original GitHub issueHi!
Symptoms
Since the last week one of our services began to stop processing messages from RabbitMQ in random times. It doesn’t consume CPU or memory when this happens and the connection between the service and RabbitMQ is alive but the service doesn’t consume new messages. It happens every day on all replicas of the service and it is very important to us to fix the issue because this service is essential for our product.
For the first few times we just restarted the service but the issue continues to appear. At the last time we took a process dump and it had a surprise: Client Command Dispatcher Thread executes our code for processing the message. Our code for processing sends another messages and in the dump the Client Command Dispatcher Thread was locked at RabbitBus.PublishAsync
method trying to add an action to the queue inside ClientCommandDispatcherSingleton
. However, the Client Command Dispatcher Thread should take items from this queue and this is how we get a deadlock.
Exact cause
I believe we found the exact cause of the problem. ClientCommandDispatcherSingleton.IvokeAsync
calls Task.TrySetResultSafe
and sometimes continuations are executed synchronously as part of this call. This is a known trap of the TPL and as I understand by existing of TaskHelpers.TrySetResultSafe
you’re perfectly aware of this.
I understand that TaskHelpers.TrySetResultSafe
tries to avoid hijacking of the thread by calling Task.Run
. However, Task.Run
doesn’t guarantee that the code will be executed on a different thread and I believe we get a deadlock exactly when this Task.Run
executes synchronously.
Reproducing
I created minimal console app which reproduces the problem. The algorithm is:
- Subscribe the consumer via
SubscribeAsync
- Publish 10k messages in the main thread
- The consumer sends three another messages via
await PublishAsync
- Main thread sleeps for a minute giving EasyNetQ time for consuming the messages
- Usually within 15 seconds one of the
PublishAsync
from the consumer will be executed on the Client Command Dispatcher Thread and processing of new messages will freeze.
Obviosly one of the continuations of await PublishAsync
inside the consumer was executed synchronously. This is a stack trace of the Client Command Dispatcher Thread from the dump I created when the app freezed:
ntdll.dll!776a90bc()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
KERNELBASE.dll!74ae1556()
mscorlib.ni.dll!728c5fef()
[Managed to Native Transition]
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext)
mscorlib.dll!System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(int millisecondsTimeout, uint startTime, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.SemaphoreSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
System.dll!System.Collections.Concurrent.BlockingCollection<System.Action>.TryAddWithNoTimeValidation(System.Action item, int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Func<RabbitMQ.Client.IModel, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcher.InvokeAsync(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<EasyNetQ.RabbitAdvancedBus.<PublishAsync>d__27>(ref EasyNetQ.RabbitAdvancedBus.<PublishAsync>d__27 stateMachine)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync<ReproduceThreadSteale.TestMessage>(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.IMessage<ReproduceThreadSteale.TestMessage> message)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<EasyNetQ.RabbitBus.<PublishAsync>d__13<ReproduceThreadSteale.TestMessage>>(ref EasyNetQ.RabbitBus.<PublishAsync>d__13<ReproduceThreadSteale.TestMessage> stateMachine)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, string topic)
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message)
ReproduceThreadSteale.exe!ReproduceThreadSteale.Program.Consume(ReproduceThreadSteale.TestMessage msg) Line 41
kernel32.dll!76723744()
ntdll.dll!77699e54()
ntdll.dll!77699e1f()
[Resuming Async Method]
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.SetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult()
EasyNetQ.dll!EasyNetQ.RabbitBus.PublishAsync<ReproduceThreadSteale.TestMessage>(ReproduceThreadSteale.TestMessage message, System.Action<EasyNetQ.FluentConfiguration.IPublishConfiguration> configure)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.SetResult(System.Threading.Tasks.VoidTaskResult result)
mscorlib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult()
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Internals.TaskHelpers.TrySetResultSafe<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> source, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__1(RabbitMQ.Client.IModel channel)
EasyNetQ.dll!EasyNetQ.Producer.PersistentChannel.InvokeChannelAction(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__0()
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.StartDispatcherThread.AnonymousMethod__10_0()
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart()
[Native to Managed Transition]
At the bottom of the stack trace we see a synchronous call of continuations from TaskCompletionSource.TrySetResult
which leads us to RabbitAdvancedBus.PublishAsync
and finnaly to the Program.Consume
method:
...
ReproduceThreadSteale.exe!ReproduceThreadSteale.Program.Consume(ReproduceThreadSteale.TestMessage msg) Line 41
...
EasyNetQ.dll!EasyNetQ.RabbitAdvancedBus.PublishAsync(EasyNetQ.Topology.IExchange exchange, string routingKey, bool mandatory, EasyNetQ.MessageProperties messageProperties, byte[] body)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
mscorlib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining, ref System.Threading.Tasks.Task currentTask)
mscorlib.dll!System.Threading.Tasks.Task.FinishContinuations()
mscorlib.dll!System.Threading.Tasks.Task.FinishStageThree()
mscorlib.dll!System.Threading.Tasks.Task<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
mscorlib.dll!System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>.TrySetResult(EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Internals.TaskHelpers.TrySetResultSafe<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct>(System.Threading.Tasks.TaskCompletionSource<EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct> source, EasyNetQ.Producer.ClientCommandDispatcherSingleton.NoContentStruct result)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__1(RabbitMQ.Client.IModel channel)
EasyNetQ.dll!EasyNetQ.Producer.PersistentChannel.InvokeChannelAction(System.Action<RabbitMQ.Client.IModel> channelAction)
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.InvokeAsync.AnonymousMethod__0()
EasyNetQ.dll!EasyNetQ.Producer.ClientCommandDispatcherSingleton.StartDispatcherThread.AnonymousMethod__10_0()
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
...
Fix
I can think about three possible ways to fix:
- Do not call
TrySetResult
on the Client Command Dispatched Thread - Call
TrySetResult
on the Client Command Dispatched Thread but ensure that continuations will be executed on another threads - Replace Client Command Dispatched Thread with critical section around
persistentChannel
The first way requires another queue for storing results and another thread that will consume this queue. If this queue will not have capacity limit then nothing could be blocked on .Add
and deadlock will not happen.
The second way I took from Stack Overflow: http://stackoverflow.com/a/22588431/458723
. It requires a very dirty hack with cancelling a Task
and it is just too dirty for me 😃 However, on .NET 4.6 we have TaskCreationOptions.RunContinuationsAsynchronously which is the best way for me but we still need to fix the issue for EasyNetQ users on .NET 4.5.
The third way is super easy to implement with SemaphoreSlim.WaitAsync
. I implemented this and currently we switched to our custom build of EasyNetQ with this fix because it is super critical to us to avoid freezes. I’m afraid that there was a special reason why the Client Command Dispatched Thread was implemented instead of critical section which I can’t understand.
Sorry for long report 😃
Issue Analytics
- State:
- Created 7 years ago
- Reactions:7
- Comments:18 (15 by maintainers)
Top GitHub Comments
https://github.com/EasyNetQ/EasyNetQ/pull/967 should fix this, at least for consumers of easynetq who are using framework of net46+
Thanks @flyingpie! I’ll take a look at both options