When RabbitMQ is down calling a task is stuck forever (retry settings are ignored)
See original GitHub issue#735 was closed but the problem I mentioned in https://github.com/celery/kombu/issues/735#issuecomment-327157158 was not resolved.
Recap: If trying to call a task when RabbitMQ is down, the call will be stuck forever, ignoring my retry_policy or retry=False.
Here’s a minimal Python 2.7 example to reproduce the issue:
# tasks.py
import time
from celery import Celery
app = Celery('tasks', broker='amqp://guest@localhost//')
@app.task
def some_long_computation():
print('In some_long_computation, about to sleep for 2 seconds')
time.sleep(2)
print('Exiting some_long_computation')
if __name__ == '__main__':
print('>>> About to call some_long_computation task')
# Disable retry:
some_long_computation.apply_async(retry=False)
# Or with a retry policy:
# some_long_computation.apply_async(retry=True, retry_policy={
# 'max_retries': 3, 'interval_start': 0, 'interval_step': 0.2, 'interval_max': 0.2})
print('>>> After calling some_long_computation task')
Run the worker:
celery -A tasks worker --loglevel=info
Execute the task in another shell/session:
python tasks.py
The task will complete and exit successfully.
Now, stop RabbitMQ using:
sudo service rabbitmq-server stop
Execute the task again, and you will see it’s stuck, even though the call passed retry=False
and it should have thrown an exception.
I re-tested it with celery and kombu 4.2.1 and 4.2.0 and in both versions it remained stuck and didn’t respect retry=False
.
Supplying a retry policy also didn’t work (see the commented-out example in the code above).
I tested this with python 2.7.15 on Ubuntu 18.04.1. My test environment pip freeze:
amqp==2.3.2
billiard==3.5.0.4
celery==4.2.1
kombu==4.2.1
pytz==2018.5
vine==1.1.4
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:10 (3 by maintainers)
Top GitHub Comments
Bump on this issue. This is fairly important to not blow up everyones servers in the event of some intermittent downtime. Luckily, we caught this in a unit test.
Note that the original monkey patch I created at https://github.com/celery/kombu/issues/735#issuecomment-327595747 , still works but you need to update the kombu version check.
Here’s the updated code:
Usage: call this code once, before publishing and your retry policy should be respected (if none is provided, the default is used - i.e. up-to 3 retries).