[BUG] Java Service Bus async receiver stops receiving new messages after 'Transient error occurred'
See original GitHub issueDescribe the bug
We upgraded a Service Bus async receiver component from azure-servicebus 3.4.0 to azure-messaging-servicebus 7.0.0 and the new version consistently (every 2 to 8 hours) stops processing new messages exactly ten minutes after the previous event and never recovers.
After restarting the component, queued messages are processed normally. Our Queue message arrival frequency is generally between 1 to 30 minutes. The Service Bus library does not always fail after ten minutes, but the error does always occur after a ten minute gap.
Exception or Stack Trace
2021-01-29 06:26:20,428 [boundedElastic-2] INFO <our code> - Message Acked
<exactly ten minutes elapsed without any other log messages>
2021-01-29 06:36:20,583 [single-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45185721:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link184908). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d24c6013a998_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:36:20, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:36:25,098 [parallel-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45185721:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link184908). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d24c6013a998_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:36:20, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:46:25,312 [single-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45402209:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link185672). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d5486013ace9_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:46:25, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:46:29,824 [parallel-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45402209:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link185672). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d5486013ace9_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:46:25, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:56:29,910 [single-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45555741:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186205). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d75d6013af45_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:56:29, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:56:34,423 [parallel-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45555741:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186205). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d75d6013af45_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:56:29, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 07:01:20,520 [single-1] WARN c.a.c.a.i.RequestResponseChannel - Retry #1. Transient error occurred. Retrying after 4511 ms.
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:bcdcd068e0c64840ada277486e9503bb_G1S1, SystemTracker:gateway5, Timestamp:2021-01-29T07:01:20, errorContext[NAMESPACE: pqmmjeeventhub001-ns.servicebus.windows.net, PATH: $cbs, REFERENCE_ID: cbs:sender, LINK_CREDIT: 98]
2021-01-29 07:01:20,524 [single-1] ERROR c.a.c.a.i.RequestResponseChannel - cbs - Exception in RequestResponse links. Disposing and clearing unconfirmed sends.
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:bcdcd068e0c64840ada277486e9503bb_G1S1, SystemTracker:gateway5, Timestamp:2021-01-29T07:01:20, errorContext[NAMESPACE: pqmmjeeventhub001-ns.servicebus.windows.net, PATH: $cbs, REFERENCE_ID: cbs:sender, LINK_CREDIT: 98]
2021-01-29 07:01:25,034 [parallel-1] WARN c.a.c.a.i.RequestResponseChannel - Non-retryable error occurred in connection.
2021-01-29 07:06:34,528 [single-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45709936:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186724). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d9646013b1a2_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T07:06:34, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 07:06:39,040 [parallel-1] WARN c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45709936:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186724). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d9646013b1a2_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T07:06:34, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
To Reproduce
The code is deployed in Azure Kubernetes Service and consistently fails with this error withing 2-8 hours.
Code Snippet
Here is how we are connecting to the Service Bus and processing messages:
sbClient = new ServiceBusClientBuilder()
.connectionString(serviceBusEndPoint)
.receiver()
.disableAutoComplete()
.queueName(serviceBusQueueName)
.buildAsyncClient();
sbClient.receiveMessages()
.flatMap(message -> {
boolean messageProcessedStatus = processMessage(message);
if (messageProcessedStatus) {
logger.info("Message Acked");
return sbClient.complete(message);
} else {
return sbClient.abandon(message);
}
}).subscribe();
Expected behavior
The receiver should handle transient errors and restart / resume receiving messages automatically
Setup (please complete the following information):
- OS: Ubuntu 16.04.7 LTS / AKS v1.17.11
- JRE: openjdk:8-jre-alpine
- com.azure.azure-messaging-servicebus 7.0.0
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- Bug Description Added
- Repro Steps Added
- Setup information Added
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (4 by maintainers)
Hi @yuriy-osychenko, for the async code, both
sbClient.complete(message)
andsbClient.abandon(message)
may throw exceptions. I suggest you add something like.onErrorResume
to catch the exception so the reactive streaming doesn’t error out.We’re experiencing the same issue, currently we are using following setup:
Can you please help me understand if there is a plan on fixing this issue? I see that it was added to the May 2021 milestone, but May milestone is already closed and this issue is not postponed anywhere.