question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Java Service Bus async receiver stops receiving new messages after 'Transient error occurred'

See original GitHub issue

Describe the bug

We upgraded a Service Bus async receiver component from azure-servicebus 3.4.0 to azure-messaging-servicebus 7.0.0 and the new version consistently (every 2 to 8 hours) stops processing new messages exactly ten minutes after the previous event and never recovers.

After restarting the component, queued messages are processed normally. Our Queue message arrival frequency is generally between 1 to 30 minutes. The Service Bus library does not always fail after ten minutes, but the error does always occur after a ten minute gap.

Exception or Stack Trace

2021-01-29 06:26:20,428 [boundedElastic-2] INFO  <our code> - Message Acked

<exactly ten minutes elapsed without any other log messages>

2021-01-29 06:36:20,583 [single-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45185721:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link184908). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d24c6013a998_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:36:20, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:36:25,098 [parallel-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45185721:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link184908). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d24c6013a998_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:36:20, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:46:25,312 [single-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45402209:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link185672). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d5486013ace9_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:46:25, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:46:29,824 [parallel-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45402209:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link185672). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d5486013ace9_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:46:25, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:56:29,910 [single-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45555741:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186205). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d75d6013af45_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:56:29, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 06:56:34,423 [parallel-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45555741:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186205). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d75d6013af45_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T06:56:29, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 07:01:20,520 [single-1] WARN  c.a.c.a.i.RequestResponseChannel - Retry #1. Transient error occurred. Retrying after 4511 ms.
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:bcdcd068e0c64840ada277486e9503bb_G1S1, SystemTracker:gateway5, Timestamp:2021-01-29T07:01:20, errorContext[NAMESPACE: pqmmjeeventhub001-ns.servicebus.windows.net, PATH: $cbs, REFERENCE_ID: cbs:sender, LINK_CREDIT: 98]
2021-01-29 07:01:20,524 [single-1] ERROR c.a.c.a.i.RequestResponseChannel - cbs - Exception in RequestResponse links. Disposing and clearing unconfirmed sends.
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:bcdcd068e0c64840ada277486e9503bb_G1S1, SystemTracker:gateway5, Timestamp:2021-01-29T07:01:20, errorContext[NAMESPACE: pqmmjeeventhub001-ns.servicebus.windows.net, PATH: $cbs, REFERENCE_ID: cbs:sender, LINK_CREDIT: 98]
2021-01-29 07:01:25,034 [parallel-1] WARN  c.a.c.a.i.RequestResponseChannel - Non-retryable error occurred in connection.
2021-01-29 07:06:34,528 [single-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 1. Retrying after 4511 ms.
The link 'G12:45709936:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186724). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d9646013b1a2_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T07:06:34, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]
2021-01-29 07:06:39,040 [parallel-1] WARN  c.a.m.s.i.ServiceBusReceiveLinkProcessor - linkName[n/a] entityPath[n/a]. Transient error occurred. Attempt: 2. Retrying after 14575 ms.
The link 'G12:45709936:eph-messages_e7417e_1611901335622' is force detached. Code: consumer(link186724). Details: AmqpMessageConsumer.IdleTimerExpired: Idle timeout: 00:10:00. TrackingId:15928218000002070002d9646013b1a2_G12_B9, SystemTracker:example:Queue:eph-messages, Timestamp:2021-01-29T07:06:34, errorContext[NAMESPACE: example.servicebus.windows.net, PATH: eph-messages, REFERENCE_ID: eph-messages_e7417e_1611901335622, LINK_CREDIT: 0]

To Reproduce

The code is deployed in Azure Kubernetes Service and consistently fails with this error withing 2-8 hours.

Code Snippet

Here is how we are connecting to the Service Bus and processing messages:

  sbClient = new ServiceBusClientBuilder()
          .connectionString(serviceBusEndPoint)
          .receiver()
          .disableAutoComplete()
          .queueName(serviceBusQueueName)
          .buildAsyncClient();
  sbClient.receiveMessages()
          .flatMap(message -> {
              boolean messageProcessedStatus = processMessage(message);
              if (messageProcessedStatus) {
                  logger.info("Message Acked");
                  return sbClient.complete(message);
              } else {
                  return sbClient.abandon(message);
              }
          }).subscribe();

Expected behavior

The receiver should handle transient errors and restart / resume receiving messages automatically

Setup (please complete the following information):

  • OS: Ubuntu 16.04.7 LTS / AKS v1.17.11
  • JRE: openjdk:8-jre-alpine
  • com.azure.azure-messaging-servicebus 7.0.0

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
YijunXieMScommented, Jul 28, 2021

Hi @yuriy-osychenko, for the async code, both sbClient.complete(message) and sbClient.abandon(message) may throw exceptions. I suggest you add something like .onErrorResume to catch the exception so the reactive streaming doesn’t error out.

0reactions
yuriy-osychenkocommented, Jul 27, 2021

We’re experiencing the same issue, currently we are using following setup:

  • azure-spring-cloud-stream-binder-servicebus-topic version 2.5.0
  • azure-messaging-servicebus version 7.2.3
  • openjdk:11-jre-slim
  • Azure Kubernetes Service

Can you please help me understand if there is a plan on fixing this issue? I see that it was added to the May 2021 milestone, but May milestone is already closed and this issue is not postponed anywhere.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Service Bus Receiver stops receiving messages from ...
The Service Bus queue receiver stops receiving the messages, while the queue sender can keep sending them. The only way we can fix...
Read more >
Azure Service Bus client library for Java - Javadoc.io
ServiceBusReceiverAsyncClient A asynchronous receiver responsible for receiving ServiceBusMessage from a specific queue or topic on Azure Service Bus. Examples.
Read more >
azure-servicebus - PyPI
Use the Service Bus client library for Python to communicate between applications and services and implement asynchronous messaging patterns. Create Service Bus ......
Read more >
Overview (Azure SDK for Java Reference Documentation) - NET
The Azure Service Bus client library allows for sending and receiving of Azure ... ServiceBusReceiverAsyncClient receiver = new ServiceBusClientBuilder() .
Read more >
Service Bus forum - RSSing.com
I have an Azure Service Bus queue that is not receiving all messages sent to it. ... Next How to handle the Exception...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found