[BUG] [Microsoft.Azure.ServiceBus] Closing MessageReceiver Does not Always Close inner ReceivingAmqpLink
See original GitHub issueDescribe the bug During periods when a large number of server side errors occur, we sometimes see messages getting “stuck” in queues. As in, they hang in the queues for the configured message lock timeout before being redelivered.
After some attempts to reproduce with a smaller example, I have found that, in certain scenarios, when calling MessageReceiver.CloseAsync(), the inner ReceivingAmqpLink is not actually closed. So, the link is sitting there in the background, continually picking up messages
The only way I was able to get this to reproduce is by closing & opening a new receiver when receiving an error on the ExceptionHandler. My best guess to why this issue occurs; when the inner link faults, it will auto-recover in OnReceiveAsync(). Perhaps there is a race condition with auto recovery & closing the receiver at similar times.
Of course, perhaps there is something completely off with the usage of the sdk here as well.
Expected behavior In general, I would expect that CloseAsync() would always close the inner ReceivingAmqpLink.
To Reproduce Reproduction Repo = https://github.com/paulsavides/ServiceBusTesting
ReproProject
is the project that reproduces this issue. If the code is doing something extremely incorrect, please let me know. We are actually using the MassTransit
library to interact with AzureServiceBus so I had to recreate a bit of what it was doing that reproduces the error.
- Open solution from production repo
- Set ReproProject as Startup project
- Fill in Endpoint & Shared Access Key Signature in Program.cs
- Run the project
- While the project is running, open the Queue in the Azure UI & continually update the
Auto-delete after idle
setting.- This is an attempt to cause errors that requires the links to be recreated
- You can view this video to see exactly what I mean if the instructions are unclear https://www.youtube.com/watch?v=sv0bRozEevs
- Eventually, in the console output, you will see errors coming through the exception handler & the receiver will ‘recycle’ some number of times
- After recycle, you should start seeing message sends & receives being mismatched
- if not, go back to step 5
- Press
d
to print out diagnostics on all of the links from “closed” receivers that are still open & the number of unsettled messages from those links
Environment:
- Microsoft.Azure.ServiceBus 5.0.0
- .net sdk 3.1.102, Microsoft.NETCore.App 3.1.9
- Visual Studio 16.8.1
- Have verified the issue occurs on AzureServiceBus standard tier, I believe I have seen it on the premium tier as well.
Please let me know if you require any clarification from me.
Thank you for taking the time to look into this, Paul Savides
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (6 by maintainers)
Thank you Josh!
Hello @DorothySun216,
I have update my reproduction repo listed above to version 5.1.1 and was no longer able to reproduce the issue. Additionally, we have been directly using Microsoft.Azure.ServiceBus v2.4.9 in our services for around two months now and have not seen the issue reproduce.
Thank you, have a wonderful day!