No exception raised when heartbeat timed out
See original GitHub issueOur team is working on migrating an existing project that’s using the 0.12.0 version of pika to the latest 1.2.0. The project relies on the heartbeat timeout exception raised by pika to determine if the network is disconnected, which runs well with the 0.12.0 version but not the 1.2.0 as the exception seems no longer to be raised when the network is disconnected.
A code snippet used to reproduce this issue is as follows:
credentials = pika.PlainCredentials(<username>, <password>)
params = pika.ConnectionParameters(<server-ip>, <server-port>, '/', credentials, heartbeat=60)
data = json.dumps(<some-data>)
conn = pika.BlockingConnection(params)
chan = conn.channel()
chan.exchange_declare(
exchange=<exchange-name>,
exchange_type='topic',
durable=False,
auto_delete=True
)
while True:
time.sleep(5)
print('publishing...')
chan.basic_publish(
exchange=<exchange-name>,
routing_key=<routing-key>,
body=data
)
By creating different virtual environments, pip installing different versions of pika, running the same code snippet, and then manually disconnecting the network (via unplugging the ethernet cable) after the loop gets entered (i.e. the first print line appears), our team got different results: the 0.12.0 version of pika will throw an exception that breaks the loop and terminates the program (that’s what we want), but with 1.2.0 the loop seems to run forever and never stop, the basic_publish() method gets called again and again but we believe it doesn’t have any effects anymore.
We have inspected the source code of 1.2.0 and found the HeartbeatChecker
actually successfully detected the idle network but just cannot figure out why there is no exception emitted.
Our RabbitMQ server is running version 3.6.10 with Erlang 20.2.2.
Is this behavior a change intentionally made through 0.12.0 -> 1.2.0 or a bug created accidentally instead?
Any input is welcome!
Thank all contributors for your great efforts!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (2 by maintainers)
Top GitHub Comments
I’m pretty sure I encountered the same problem. After analyzing the code I see that after the HeartbeatChecker aborts the connection, the
_flush_output
ofBlockingConnection
stops running callbacks because of the following:Since the checker causes no data to be aggregated in the buffer anymore,
self._impl._get_write_buffer_size() == 0
returns True and also thewaiters
list contains only thelambda: True
function (_ALWAYS_READY_WAITERS
). For that reason the while loop doesn’t run:process_timeouts
is responsible of popping callbacks from the_callbacks
queue, the relevant callback_connection_lost_notify_async
doesn’t run.A possible solution is to issue an additional call to
process_timeouts
right after the while. It’s not the best solution though, probably this specific scenario can be identified, and irrelevant calls can be avoided.I have narrowed down the update leading to this behavior change to the major one of 0.13.1 -> 1.0.0.
It seems the callbacks (one of which throws the wanted exception) will not be executed if the underlying asynchronous io-loop doesn’t tick by using
BlockingChannel.start_consuming()
, orBlockingConnection.process_data_events()
, orBlockingConnection.sleep()
.