question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redeliver same tasks repeatedly with redis broker and gevent

See original GitHub issue

Hi,

kombu==4.6.4 celery==4.3.0 redis==3.2.1

The visibility_timeout is ignored when using redis broker and gevent. As the result, when launching multiple workers with redis broker and gevent, the tasks will be re-delivered repeatedly. Maybe the reason is https://github.com/celery/kombu/pull/905.

def restore_visible(self, start=0, num=10, interval=10):
......
ceil = time() - self.visibility_timeout
......
env = _detect_environment()
if env == 'gevent':
    ceil = time()
visible = client.zrevrangebyscore(
    self.unacked_index_key, ceil, 0,
    start=num and start, num=num, withscores=True)
for tag, score in visible or []:
    self.restore_by_tag(tag, client)

When env==gevent, ceil will be changed to time() from time() - self.visibility_timeout. As a result, all the tasks even the newly added ones in unacked set will be fetched out into visible, and then re-delivered by calling restore_by_tag, ignoring the functionality of visibility_timeout. The function restore_visible in QoS is called by maybe_restore_messages

    def maybe_restore_messages(self):
        for channel in self._channels:
            if channel.active_queues:
                # only need to do this once, as they are not local to channel.
                return channel.qos.restore_visible(
                    num=channel.unacked_restore_limit,
                )

But the function maybe_restore_messages is further called by other methods multiple times. Particularly in register_with_event_loop, where maybe_restore_messages is called every 10 seconds.

    def register_with_event_loop(self, connection, loop):
        ...
        loop.call_repeatedly(10, cycle.maybe_restore_messages)

So when launching multiple workers with redis broker and gevent will re-deliver all tasks in unacked set repeatedly.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
mayouzicommented, May 15, 2020

replace gevent with eventlet

4reactions
arikfrcommented, Oct 6, 2019

This still reproduces on latest versions, as the relevant code is still there:

https://github.com/celery/kombu/blob/b51d1d678e198a80d7e5fd95f32674c7d8e04a75/kombu/transport/redis.py#L197-L199

Whenever restore_visible is called in a gevent environment it will restore all unacked messages, regardless of their visibility timeout. While #905 tried to fix an issue with not restoring messages when worker starts, they broke this functionality in all other cases (restore_visible is called periodically and not only when worker starts).

In our case, we don’t even use gevent based workers. But our API services do use gevent. When we use celery.control to monitor the status of Celery (we have an API to return queues status), it triggers restore_visible with this broken code. It took us a while to find this 😢

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is it possible to redeliver a Celery worker's tasks as soon as ...
I tested doing this on RabbitMQ and it works as expected. It doesn't seem to work with Redis as message broker though. Do...
Read more >
Celery Documentation - Read the Docs
Dedicated worker processes constantly monitor task queues for new work to perform. Celery communicates via messages, usually using a broker ...
Read more >
Configuration Reference — Airflow Documentation
Celery task will report its status as 'started' when the task is executed by a worker. This is used in Airflow to keep...
Read more >
Change history — Kombu 5.2.4 documentation - Celery
Set redelivered property for Celery with Redis (#1484). ... Use SIMEMBERS instead of SMEMBERS to check for queue (redis broker).
Read more >
Deep learning in production with Keras, Redis, Flask, and ...
Shipping deep learning models to production is a non-trivial task. ... and perhaps most importantly, making it scalable at the same time?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found