Redeliver same tasks repeatedly with redis broker and gevent
See original GitHub issueHi,
kombu==4.6.4 celery==4.3.0 redis==3.2.1
The visibility_timeout is ignored when using redis broker and gevent. As the result, when launching multiple workers with redis broker and gevent, the tasks will be re-delivered repeatedly. Maybe the reason is https://github.com/celery/kombu/pull/905.
def restore_visible(self, start=0, num=10, interval=10):
......
ceil = time() - self.visibility_timeout
......
env = _detect_environment()
if env == 'gevent':
ceil = time()
visible = client.zrevrangebyscore(
self.unacked_index_key, ceil, 0,
start=num and start, num=num, withscores=True)
for tag, score in visible or []:
self.restore_by_tag(tag, client)
When env==gevent, ceil will be changed to time() from time() - self.visibility_timeout. As a result, all the tasks even the newly added ones in unacked set will be fetched out into visible, and then re-delivered by calling restore_by_tag, ignoring the functionality of visibility_timeout. The function restore_visible in QoS is called by maybe_restore_messages
def maybe_restore_messages(self):
for channel in self._channels:
if channel.active_queues:
# only need to do this once, as they are not local to channel.
return channel.qos.restore_visible(
num=channel.unacked_restore_limit,
)
But the function maybe_restore_messages is further called by other methods multiple times. Particularly in register_with_event_loop, where maybe_restore_messages is called every 10 seconds.
def register_with_event_loop(self, connection, loop):
...
loop.call_repeatedly(10, cycle.maybe_restore_messages)
So when launching multiple workers with redis broker and gevent will re-deliver all tasks in unacked set repeatedly.
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (4 by maintainers)
Top GitHub Comments
replace
gevent
witheventlet
This still reproduces on latest versions, as the relevant code is still there:
https://github.com/celery/kombu/blob/b51d1d678e198a80d7e5fd95f32674c7d8e04a75/kombu/transport/redis.py#L197-L199
Whenever
restore_visible
is called in agevent
environment it will restore all unacked messages, regardless of their visibility timeout. While #905 tried to fix an issue with not restoring messages when worker starts, they broke this functionality in all other cases (restore_visible
is called periodically and not only when worker starts).In our case, we don’t even use
gevent
based workers. But our API services do usegevent
. When we usecelery.control
to monitor the status of Celery (we have an API to return queues status), it triggersrestore_visible
with this broken code. It took us a while to find this 😢