Requests seems to stucks when using with futures.ThreadPoolExecutor
See original GitHub issueHello,
I’m using Python 2.7.9 with futures (3.0.3) and requests (2.7.0) on Debian (also tested on Win8 and results are same).
The problem is, Requests doesn’t timeout and stucks, so it seems my threads never finish their jobs and stops processing queue.
I’m trying to make a multi-threaded web crawler and I’m fetching to-be-crawled URLs from frontier (which returns a json list of domains) and populating a queue with them.
After this I’m populating Thread Pool with the code below
while not url_queue.empty():
queue_data = url_queue.get()
task_pool.submit(processItem, queue_data)
In processItem() function, I’m fetching url with get_data() and marking the queue item with task_done()
My get_data() function is as follows
def get_data(fqdn):
try:
response = requests.get("http://"+fqdn, headers=headers, allow_redirects=True, timeout=3)
if response.status_code == requests.codes.ok:
result = response.text
else:
result = ""
except requests.exceptions.RequestException as e:
print "ERROR OCCURED:"
print fqdn
print e.message
result = ""
return result
If I mark get_data() as comment in processItem(), all threads and queue works fine. If I uncomment it, works fine for most of requests but stucking for some and that affects all queue and script because queue.join() waits for threads to complete requests. I suppose it’s a bug of requests module as everything works fine without calling get_data() and as requests doesn’t time out the GET request.
Any help will be greatly appreciated… Thank you very much…
Issue Analytics
- State:
- Created 8 years ago
- Reactions:2
- Comments:24 (11 by maintainers)
@metrue to maintain a thread-safe/multiprocess-safe queue, you can use the standard library’s
Queue
implementation. If you’re on Python 2if you’re on Python 3
If you are using a process pool executor you must not use a Session that is shared across those processes.