question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using task_done() in multiple threads

See original GitHub issue

I’d like to use Queue to store items to be processed by threads. However, if one of the items fails to get processed (and task_done is hence not called) it’s still possible that the item is removed from the queue persistently (whereas one would expect it not to be, as is usual behaviour).

Example:

import threading
import time

from persistqueue import Queue

q = Queue("testq")


def worker1():
    print("getting from worker1")
    x = q.get()
    print("got", x, "from worker1")
    # processing goes here ... takes some time
    time.sleep(2)
    try:
        assert False, "something went wrong"
        q.task_done()
    except:
        print("something went wrong with worker1 in processing", x, "so not calling task_done")


def worker2():
    time.sleep(1)
    print("getting from worker2")
    x = q.get()
    print("got", x, "from worker2")
    # processing would happen here - but happens quicker than task1
    print("finished processing", x, "from worker2 so calling task_done")
    q.task_done()
    print("called task_done from worker2")


if __name__ == "__main__":

    q.put("a")
    q.put("b")

    t1 = threading.Thread(target=worker1)
    t1.start()
    t2 = threading.Thread(target=worker2)
    t2.start()
    t1.join()
    t2.join()
    print("reloading q")
    del q
    q = Queue("testq")
    print("qsize", q.qsize())

Output:

getting from worker1
got a from worker1
getting from worker2
got b from worker2
finished processing b from worker2 so calling task_done
called task_done from worker2
something went wrong with worker1 in processing a so not calling task_done
reloading q
qsize 0

As you can see, 'a' was permanently removed, even though task_done “wasn’t” called. In other words, I’d expect to see qsize 1 as the output. Is there a way to achieve this, i.e. task_done only completes a specific task, not all tasks in all threads?

Bonus question: how do I also add 'a' back onto the in-memory queue (ignoring persistence)? I.e. the equivalent of SQLiteAckQueue.nack? The only way I see how would be reloading the queue from disk (in which case the get wouldn’t have persisted) but this seems messy.

(Also, yes, I know of the SQLiteAckQueue which seems well-suited, but I’d prefer to use plain files if possible.)

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
peter-wangxucommented, Jan 30, 2019

@kodonnell sqlite ack queue should fit well in your case since you have strict FIFO requirement, I strongly suggest you trying it.

the file queue data is written sequentially, and it’s hard to implement ACK for part of its content. If you have any idea, just pop it up here.

Thanks Peter

0reactions
avineshwarcommented, Sep 6, 2022

this is known limitation for file queue

Sorry, I wasn’t aware. Can this be documented? It’s described as “thread-safe” and this doesn’t really fit that bill. Also - doesn’t the same apply to the sqlite queue? (I assume that’s what the SQLiteAckQueue is for.)

you should reenque the failed items so that it can be processed later, can this fit your case

Ah - so you mean every time I .get() I follow it with .task_done() - and then if a fail happens, I requeue it? This should work, though the FIFO order wouldn’t be preserved - which isn’t too much of a drama for us, actually.

Late here but use multiple similar queue having dedicated roles (i.e. not just 1 but 2 additional compensating queues, so 1 queue for success and 2 queues for handling failure scenarios); accordingly put items so that you can jump / swap between them. That should ensure FIFO with some additional load.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Queue task_done() and join() in Python
You can mark queue tasks done via task_done() and be notified when all ... Multiple different threads may join the queue and await...
Read more >
Python - What is queue.task_done() used for? - Stack Overflow
Thread ) fetching URLs from a Queue using queue.get_nowait() , and then processing the HTML. I am new to multi-threaded programming, and am ......
Read more >
Python Multithreading and Multiprocessing Tutorial - Toptal
The scripts in these Python multithreading examples have been tested with Python ... link) finally: self.queue.task_done() def main(): ts = time() client_id ...
Read more >
The Basics of Python Multithreading and Queues - Troy Fawkes
Here's an example of a simple program that uses Queues: from Queue import Queue def do_stuff(q): while not q.empty(): print q.get() q.task_done() q ......
Read more >
The Most Simple Explanation of Threads and Queues in Python
Here, I try to unfold those terms and show how multithreading works… ... As soon as the task done, using tase_done() (line 9),...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found