Consumer hangs (stops processing messages) after 20 minutes of inactivity
See original GitHub issueI have a problem with our task worker built on top of Kombu (without using Celery): After about 20 minutes of not receiving any messages the worker will hang and stop processing any messages at all, even if new messages arrive. When shutting it down by sending a keyboard interrupt the shutdown takes about 60 seconds where it hangs after printing the [Kombu channel:1] basic_cancel('1')
message. It appears the process hangs somewhere in the low level channel polling/messaging (?)
Setup
The setup is simple: The worker is to listen on a RabbitMQ topic exchange for all messages matching a certain routing key. As it retrieves messages it unpacks them (the messages itself are protobuff objects) and then runs certain actions based on the content.
We use Python 2.7.9, Kombu 3.0.26 and have amqp 1.4.6 and librabbitmq 1.6.1 installed. For the consumers we use the Kombu Consumer Mixin that is called/run like this:
# Connect to rabbitMQ and start consuming messages
self._connection = Connection(config.rabbit_connection_url)
self._consumer = MyConsumer(self, self._connection)
self._consumer.run()
self._connection.close() # When _consumer.run finishes we are done too
Where rabbit_connection_url looks like "pyamqp://"+RABBITMQ_USERNAME+":"+RABBITMQ_PASSWORD+"@"+RABBITMQ_HOST+":"+str(RABBITMQ_PORT)+"/"+RABBITMQ_VIRTUAL_HOST
. I have tried using both pyamqp
and librabbitmq
as protocols to force the usage of amqp and librabbitmq with no success.
The implementation of MyConsumer:
class MyConsumer(ConsumerMixin):
def __init__(self, service, conn):
self.service = service
self.connection = conn
def get_consumers(self, Consumer, channel):
return [Consumer(queues = self.queues, on_message = <callback function, not important here>)]
Symptoms
When I run the service it connects without problems and starts to handle messages. However, if the consumer is idle for about 20 minutes it stops processing new messages. RabbitMQ lists the messages as ready in the queue, but the consumer never retrieves them. Even with debugging output turned on (os.environ.update(KOMBU_LOG_CHANNEL='1', KOMBU_LOG_CONNECTION='1')
) I get no special output. When the consumer hangs and I send a keyboard interrupt the program prints ^C2015-09-15 11:50:17,580 - kombu.channel - DEBUG - [Kombu channel:1] basic_cancel('1')
and then hangs for about 60 seconds before printing a stacktrace and exiting. Output from a full run, I have marked the part where a message would have been available for processing but was not processed:
2015-09-15 11:17:06,876 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] establishing connection...
2015-09-15 11:17:07,091 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] connection established: <kombu.transport.pyamqp.Connection object at 0x10fc066d0>
2015-09-15 11:17:07,092 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] create channel
2015-09-15 11:17:07,124 - kombu.channel - DEBUG - [Kombu channel:1] exchange_declare(nowait=False, exchange='MY_EXCHANGE', durable=True, passive=False, arguments=None, type='topic', auto_delete=False)
2015-09-15 11:17:07,147 - kombu.channel - DEBUG - [Kombu channel:1] queue_declare(passive=False, nowait=False, exclusive=False, durable=True, queue='scanservice-scan', arguments=None, auto_delete=False)
2015-09-15 11:17:07,174 - kombu.channel - DEBUG - [Kombu channel:1] queue_bind(queue='scanservice-scan', arguments=None, nowait=False, routing_key='scan.*', exchange='MY_EXCHANGE')
2015-09-15 11:17:07,201 - kombu.channel - DEBUG - [Kombu channel:1] basic_consume(queue='scanservice-scan', consumer_tag='1', nowait=False, no_ack=False, callback=<bound method Consumer._receive_callback of <Consumer: [<Queue scanservice-scan -> <Exchange MY_EXCHANGE(topic) bound to chan:1> -> scan.* bound to chan:1>]>>)
2015-09-15 11:17:07,231 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06c90>)
2015-09-15 11:17:16,314 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] acquired
2015-09-15 11:17:16,314 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] establishing connection...
2015-09-15 11:17:16,413 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] connection established: <kombu.transport.pyamqp.Connection object at 0x112b46550>
2015-09-15 11:17:16,413 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] create channel
2015-09-15 11:17:16,434 - kombu.channel - DEBUG - [Kombu channel:1] prepare_message('\x00\x00\x01-\n\x05\x08\x03\x10\xad\x02\x12$8b830717-342a-4147-98c2-9144d60b6846 \x01', 0, 'application/octet-stream', 'binary', {}, {'delivery_mode': 2})
2015-09-15 11:17:16,434 - kombu.channel - DEBUG - [Kombu channel:1] _basic_publish(<amqp.basic_message.Message object at 0x112b46790>, mandatory=False, routing_key='scan.scan_page_updated', immediate=False, exchange='MY_EXCHANGE')
2015-09-15 11:17:16,435 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] released
2015-09-15 11:17:16,442 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc2e250>)
2015-09-15 11:17:19,628 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] acquired
2015-09-15 11:17:19,628 - kombu.channel - DEBUG - [Kombu channel:1] prepare_message('\x00\x00\x01.\n\x05\x08\x03\x10\xae\x02\x12$c649e902-fb54-4d4f-a987-37bfb892b4fa\x18\x01 \x02', 0, 'application/octet-stream', 'binary', {}, {'delivery_mode': 2})
2015-09-15 11:17:19,629 - kombu.channel - DEBUG - [Kombu channel:1] _basic_publish(<amqp.basic_message.Message object at 0x10fc71c10>, mandatory=False, routing_key='scan.scan_updated', immediate=False, exchange='MY_EXCHANGE')
2015-09-15 11:17:19,629 - kombu.connection - DEBUG - [Kombu connection:0x112b46450] released
2015-09-15 11:17:19,635 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06f50>)
2015-09-15 11:17:20,172 - kombu.channel - DEBUG - [Kombu channel:1] message_to_python(<amqp.basic_message.Message object at 0x10fc06f90>)
####### 11:50:00 - NEW MESSAGE AVAILABLE, SHOULD HAVE BEEN PROCESSED HERE! #######
^C2015-09-15 11:50:17,580 - kombu.channel - DEBUG - [Kombu channel:1] basic_cancel('1')
2015-09-15 11:51:38,014 - kombu.channel - DEBUG - [Kombu channel:1] close()
2015-09-15 11:51:38,015 - kombu.connection - DEBUG - [Kombu connection:0x10fc06650] closed
Traceback (most recent call last):
File "main.py", line 317, in <module>
service.run()
File "/Users/robin/Documents/Projects/X/src/msgs.py", line 98, in run
self._consumer.run()
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/mixins.py", line 170, in run
for _ in self.consume(limit=None): # pragma: no cover
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/mixins.py", line 193, in consume
conn.drain_events(timeout=safety_interval)
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 95, in drain_events
return connection.drain_events(**kwargs)
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 302, in drain_events
chanmap, None, timeout=timeout,
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 365, in _wait_multiple
channel, method_sig, args, content = read_timeout(timeout)
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/connection.py", line 336, in read_timeout
return self.method_reader.read_method()
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/method_framing.py", line 186, in read_method
self._next_method()
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/method_framing.py", line 107, in _next_method
frame_type, channel, payload = read_frame()
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/transport.py", line 154, in read_frame
frame_header = read(7, True)
File "/Users/robin/Documents/Projects/X/lib/python2.7/site-packages/amqp/transport.py", line 277, in _read
s = recv(n - len(rbuf))
KeyboardInterrupt
Reproducibility
This happends on all our systems, dev mode on OS X 10.10.5 as well as on all our deployed systems (Ubuntu on docker). The 20 minutes are somewhat a guess, may also be 18 minutes but the consumers consistently hang if they are not processing a message for something like ~20 minutes.
I am really out of ideas what may be the issue here, the same setup works reliably for consuming messages with other libraries in Java. Any ideas on how to debug this or where the problem may be?
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
Can I get any update on this issue as it seems to be happening in our environment also.
Pro tip. Check your firewalls. We had this issue and it went away when we reviewed our internal firewalls. Its to do with the consumer never firing back an ACK to the message.