question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Requests seems to stucks when using with futures.ThreadPoolExecutor

See original GitHub issue

Hello,

I’m using Python 2.7.9 with futures (3.0.3) and requests (2.7.0) on Debian (also tested on Win8 and results are same).

The problem is, Requests doesn’t timeout and stucks, so it seems my threads never finish their jobs and stops processing queue.

I’m trying to make a multi-threaded web crawler and I’m fetching to-be-crawled URLs from frontier (which returns a json list of domains) and populating a queue with them.

After this I’m populating Thread Pool with the code below

while not url_queue.empty():
    queue_data = url_queue.get()
    task_pool.submit(processItem, queue_data)

In processItem() function, I’m fetching url with get_data() and marking the queue item with task_done()

My get_data() function is as follows

def get_data(fqdn):
    try:
         response = requests.get("http://"+fqdn, headers=headers, allow_redirects=True, timeout=3)

        if response.status_code == requests.codes.ok:
            result = response.text
        else:
            result = ""

    except requests.exceptions.RequestException as e:
        print "ERROR OCCURED:"
        print fqdn
        print e.message
        result  = ""

    return result

If I mark get_data() as comment in processItem(), all threads and queue works fine. If I uncomment it, works fine for most of requests but stucking for some and that affects all queue and script because queue.join() waits for threads to complete requests. I suppose it’s a bug of requests module as everything works fine without calling get_data() and as requests doesn’t time out the GET request.

Any help will be greatly appreciated… Thank you very much…

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Reactions:2
  • Comments:24 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
sigmavirus24commented, Aug 27, 2016

@metrue to maintain a thread-safe/multiprocess-safe queue, you can use the standard library’s Queue implementation. If you’re on Python 2

import Queue

task_queue = Queue.Queue()

if you’re on Python 3

import queue

task_queue = queue.Queue()
1reaction
Lukasacommented, Aug 27, 2016

If you are using a process pool executor you must not use a Session that is shared across those processes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - user interface freezed when using concurrent.futures ...
I think the user interface shouldn't be freezed when using concurrent.futures.ThreadPoolExecutor here, but it doesn't meet my expectations ...
Read more >
Issue 36780: Interpreter exit blocks waiting for futures of shut ...
At interpreter shutdown, Python waits for all pending futures of all executors to finish. There seems to be no way to disable the...
Read more >
Python Multithreading and Multiprocessing Tutorial - Toptal
Using a concurrent.futures.ThreadPoolExecutor makes the Python threading example code almost identical to the multiprocessing module.
Read more >
How To Identify Deadlocks With The ThreadPoolExecutor in ...
If threads in a ThreadPoolExecutor are stuck in a deadlock, it means that the thread pool itself cannot be shutdown and will impact...
Read more >
Adventures in Python with concurrent.futures - alexwlchan
In real code, this would be a ThreadPoolExecutor or a ProcessPoolExecutor – I've been using ThreadPoolExecutor without any arguments, because ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found