Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redeliver same tasks repeatedly with redis broker and gevent

See original GitHub issue

Hi,

kombu==4.6.4 celery==4.3.0 redis==3.2.1

The visibility_timeout is ignored when using redis broker and gevent. As the result, when launching multiple workers with redis broker and gevent, the tasks will be re-delivered repeatedly. Maybe the reason is https://github.com/celery/kombu/pull/905.

def restore_visible(self, start=0, num=10, interval=10):
......
ceil = time() - self.visibility_timeout
......
env = _detect_environment()
if env == 'gevent':
    ceil = time()
visible = client.zrevrangebyscore(
    self.unacked_index_key, ceil, 0,
    start=num and start, num=num, withscores=True)
for tag, score in visible or []:
    self.restore_by_tag(tag, client)

When env==gevent, ceil will be changed to time() from time() - self.visibility_timeout. As a result, all the tasks even the newly added ones in unacked set will be fetched out into visible, and then re-delivered by calling restore_by_tag, ignoring the functionality of visibility_timeout. The function restore_visible in QoS is called by maybe_restore_messages

    def maybe_restore_messages(self):
        for channel in self._channels:
            if channel.active_queues:
                # only need to do this once, as they are not local to channel.
                return channel.qos.restore_visible(
                    num=channel.unacked_restore_limit,
                )

But the function maybe_restore_messages is further called by other methods multiple times. Particularly in register_with_event_loop, where maybe_restore_messages is called every 10 seconds.

    def register_with_event_loop(self, connection, loop):
        ...
        loop.call_repeatedly(10, cycle.maybe_restore_messages)

So when launching multiple workers with redis broker and gevent will re-deliver all tasks in unacked set repeatedly.

Issue Analytics

State:
Created 4 years ago
Comments:14 (4 by maintainers)

Top GitHub Comments

4reactions

mayouzicommented, May 15, 2020

replace gevent with eventlet

4reactions

arikfrcommented, Oct 6, 2019

This still reproduces on latest versions, as the relevant code is still there:

https://github.com/celery/kombu/blob/b51d1d678e198a80d7e5fd95f32674c7d8e04a75/kombu/transport/redis.py#L197-L199

Whenever restore_visible is called in a gevent environment it will restore all unacked messages, regardless of their visibility timeout. While #905 tried to fix an issue with not restoring messages when worker starts, they broke this functionality in all other cases (restore_visible is called periodically and not only when worker starts).

In our case, we don’t even use gevent based workers. But our API services do use gevent. When we use celery.control to monitor the status of Celery (we have an API to return queues status), it triggers restore_visible with this broken code. It took us a while to find this 😢