question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumer hangs (stops processing messages) after 20 minutes of inactivity

See original GitHub issue

I have a problem with our task worker built on top of Kombu (without using Celery): After about 20 minutes of not receiving any messages the worker will hang and stop processing any messages at all, even if new messages arrive. When shutting it down by sending a keyboard interrupt the shutdown takes about 60 seconds where it hangs after printing the [Kombu channel:1] basic_cancel('1') message. It appears the process hangs somewhere in the low level channel polling/messaging (?)

Setup

The setup is simple: The worker is to listen on a RabbitMQ topic exchange for all messages matching a certain routing key. As it retrieves messages it unpacks them (the messages itself are protobuff objects) and then runs certain actions based on the content.

We use Python 2.7.9, Kombu 3.0.26 and have amqp 1.4.6 and librabbitmq 1.6.1 installed. For the consumers we use the Kombu Consumer Mixin that is called/run like this:

# Connect to rabbitMQ and start consuming messages
self._connection = Connection(config.rabbit_connection_url)

self._consumer = MyConsumer(self, self._connection)
self._consumer.run()

self._connection.close() # When _consumer.run finishes we are done too

Where rabbit_connection_url looks like "pyamqp://"+RABBITMQ_USERNAME+":"+RABBITMQ_PASSWORD+"@"+RABBITMQ_HOST+":"+str(RABBITMQ_PORT)+"/"+RABBITMQ_VIRTUAL_HOST. I have tried using both pyamqp and librabbitmq as protocols to force the usage of amqp and librabbitmq with no success.

The implementation of MyConsumer:

class MyConsumer(ConsumerMixin):
    def __init__(self, service, conn):
         self.service = service
         self.connection = conn

    def get_consumers(self, Consumer, channel):
         return [Consumer(queues = self.queues, on_message = <callback function, not important here>)]

Symptoms

When I run the service it connects without problems and starts to handle messages. However, if the consumer is idle for about 20 minutes it stops processing new messages. RabbitMQ lists the messages as ready in the queue, but the consumer never retrieves them. Even with debugging output turned on (os.environ.update(KOMBU_LOG_CHANNEL='1', KOMBU_LOG_CONNECTION='1')) I get no special output. When the consumer hangs and I send a keyboard interrupt the program prints ^C2015-09-15 11:50:17,580 - kombu.channel - DEBUG - [Kombu channel:1] basic_cancel('1') and then hangs for about 60 seconds before printing a stacktrace and exiting. Output from a full run, I have marked the part where a message would have been available for processing but was not processed:

2015-09-15 11:17:06,876 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] establishing connection...
2015-09-15 11:17:07,091 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] connection established: <kombu.transport.pyamqp.Connection object at 0x10fc066d0>
2015-09-15 11:17:07,092 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] create channel
2015-09-15 11:17:07,124 - kombu.channel - DEBUG - [Kombu channel:1] exchange_declare(nowait=False, exchange='MY_EXCHANGE', durable=True, passive=False, arguments=None, type='topic', auto_delete=False)
2015-09-15 11:17:07,147 - kombu.channel - DEBUG - [Kombu channel:1] queue_declare(passive=False, nowait=False, exclusive=False, durable=True, queue='scanservice-scan', arguments=None, auto_delete=False)
2015-09-15 11:17:07,174 - kombu.channel - DEBUG - [Kombu channel:1] queue_bind(queue='scanservice-scan', arguments=None, nowait=False, routing_key='scan.*', exchange='MY_EXCHANGE')
2015-09-15 11:17:07,201 - kombu.channel - DEBUG - [Kombu channel:1] basic_consume(queue='scanservice-scan', consumer_tag='1', nowait=False, no_ack=False, callback=<bound method Consumer._receive_callback of <Consumer: [<Queue scanservice-scan -> <Exchange MY_EXCHANGE(topic) bound to chan:1> -> scan.* bound to chan:1>]>>)
2015-09-15 11:17:07,231 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06c90>)
2015-09-15 11:17:16,314 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] acquired
2015-09-15 11:17:16,314 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] establishing connection...
2015-09-15 11:17:16,413 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] connection established: <kombu.transport.pyamqp.Connection object at 0x112b46550>
2015-09-15 11:17:16,413 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] create channel
2015-09-15 11:17:16,434 - kombu.channel - DEBUG - [Kombu channel:1] prepare_message('\x00\x00\x01-\n\x05\x08\x03\x10\xad\x02\x12$8b830717-342a-4147-98c2-9144d60b6846 \x01', 0, 'application/octet-stream', 'binary', {}, {'delivery_mode': 2})
2015-09-15 11:17:16,434 - kombu.channel - DEBUG - [Kombu channel:1] _basic_publish(<amqp.basic_message.Message object at 0x112b46790>, mandatory=False, routing_key='scan.scan_page_updated', immediate=False, exchange='MY_EXCHANGE')
2015-09-15 11:17:16,435 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] released
2015-09-15 11:17:16,442 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc2e250>)
2015-09-15 11:17:19,628 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] acquired
2015-09-15 11:17:19,628 - kombu.channel - DEBUG - [Kombu channel:1] prepare_message('\x00\x00\x01.\n\x05\x08\x03\x10\xae\x02\x12$c649e902-fb54-4d4f-a987-37bfb892b4fa\x18\x01 \x02', 0, 'application/octet-stream', 'binary', {}, {'delivery_mode': 2})
2015-09-15 11:17:19,629 - kombu.channel - DEBUG - [Kombu channel:1] _basic_publish(<amqp.basic_message.Message object at 0x10fc71c10>, mandatory=False, routing_key='scan.scan_updated', immediate=False, exchange='MY_EXCHANGE')
2015-09-15 11:17:19,629 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] released
2015-09-15 11:17:19,635 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06f50>)
2015-09-15 11:17:20,172 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06f90>)

####### 11:50:00 - NEW MESSAGE AVAILABLE, SHOULD HAVE BEEN PROCESSED HERE! #######

^C2015-09-15 11:50:17,580 - kombu.channel - DEBUG - [Kombu channel:1] basic_cancel('1')
2015-09-15 11:51:38,014 - kombu.channel - DEBUG - [Kombu channel:1] close()
2015-09-15 11:51:38,015 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] closed
Traceback (most recent call last):
  File "main.py", line 317, in <module>
    service.run()
  File "/Users/robin/Documents/Projects/X/src/msgs.py", line 98, in run
    self._consumer.run()
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/mixins.py", line 170, in run
    for _ in self.consume(limit=None):  # pragma: no cover
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/mixins.py", line 193, in consume
    conn.drain_events(timeout=safety_interval)
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 95, in drain_events
    return connection.drain_events(**kwargs)
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 302, in drain_events
    chanmap, None, timeout=timeout,
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 365, in _wait_multiple
    channel, method_sig, args, content = read_timeout(timeout)
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 336, in read_timeout
    return self.method_reader.read_method()
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/method_framing.py", line 186, in read_method
    self._next_method()
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/method_framing.py", line 107, in _next_method
    frame_type, channel, payload = read_frame()
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/transport.py", line 154, in read_frame
    frame_header = read(7, True)
  File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/transport.py", line 277, in _read
    s = recv(n - len(rbuf))
KeyboardInterrupt

Reproducibility

This happends on all our systems, dev mode on OS X 10.10.5 as well as on all our deployed systems (Ubuntu on docker). The 20 minutes are somewhat a guess, may also be 18 minutes but the consumers consistently hang if they are not processing a message for something like ~20 minutes.

I am really out of ideas what may be the issue here, the same setup works reliably for consuming messages with other libraries in Java. Any ideas on how to debug this or where the problem may be?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
jkbbwrcommented, Oct 25, 2016

Can I get any update on this issue as it seems to be happening in our environment also.

0reactions
jkbbwrcommented, Dec 6, 2016

Pro tip. Check your firewalls. We had this issue and it went away when we reviewed our internal firewalls. Its to do with the consumer never firing back an ACK to the message.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Periods of prolonged inactivity and frequent ... - Microsoft Learn
Some senders publish messages at the rate of one message every two seconds, while others publish one every 15 minutes.
Read more >
Why consumer hangs while consuming messages from Kafka ...
According to Kafka Consumer does not receive messages , there are two ways to connect to a topic, assign and subscribe . After...
Read more >
Documentation - Apache Kafka
Messaging systems often work around this by having a notion of "exclusive consumer" that allows only one process to consume from a queue,...
Read more >
IBM Spectrum Protect 8.1 Windows backup-archive client ...
Problem Verification: The operation can appear to be hung, as it might take up to 20 minutes before failing. The failures can be...
Read more >
Azure Service Bus and its Complete Overview | Serverless360
When a queue or subscription client receives a message that it is willing to process, but for which processing is not currently possible...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found