Threads calling S3 operations return RuntimeError (cannot schedule new futures after interpreter shutdown)
See original GitHub issueDescribe the bug
Basic S3 operations, like downloading or uploading files to buckets, when used in Python 3 threaded application methods, result in a RuntimeException
. No bug reports are located here so this documents the error and requests a recommended workaround, if available.
Background Python 3.8 introduced some changes to how the concurrent futures module handled executor requests. Ostensibly, this prevents new tasks from being scheduled after the executor received a shutdown signal. The changes caused Boto3 versions (at least some) after 1.17.53 to yield the following exception:
cannot schedule new futures after interpreter shutdown
Traceback (most recent call last):
File \"<some_file_calling_an_s3_operation>.py\", line 277, in <method_calling_an_s3_operation>
s3_client.download_file(bucket_name, file_key, file_destination)
File \"/usr/local/lib/python3.9/site-packages/boto3/s3/inject.py\", line 170, in download_file
return transfer.download_file(
File \"/usr/local/lib/python3.9/site-packages/boto3/s3/transfer.py\", line 304, in download_file
future = self._manager.download(
File \"/usr/local/lib/python3.9/site-packages/s3transfer/manager.py\", line 369, in download
return self._submit_transfer(
File \"/usr/local/lib/python3.9/site-packages/s3transfer/manager.py\", line 500, in _submit_transfer
self._submission_executor.submit(
File \"/usr/local/lib/python3.9/site-packages/s3transfer/futures.py\", line 467, in submit
future = ExecutorFuture(self._executor.submit(task))
File \"/usr/local/lib/python3.9/concurrent/futures/thread.py\", line 163, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown
This impacted Apache Airflow to the extent that the solution was to disable threading in S3 operations. Similarly, there are other related bug reports. This has appeared sporadically in similar scenarios.
This ticket seeks guidance from the Boto3 team on how to best deal with this issue. (NOTE: Recommendations online suggest reverting to Boto3 1.17.53 [see above]. Another potential solution is disabling threading in S3 operations using TransferConfig. Another potential solution is using Thread.join()
on the topmost thread, but that will result in waits and may not be readily possible, depending on architecture.
Steps to reproduce
This was reproduced with the following application setup:
Python 3.9.9
CentOS 7
botocore==1.20.112
boto3==1.17.112
Example Code:
#!/usr/bin/python3
import logging
from queue import Queue
import threading
import time
log = logging.getLogger(__main__)
def finalizer(some_queue):
while True: # loop to catch all items
time.sleep(0.05) # poor man's nice
if not some_queue.empty():
try:
# application logic here
method_that_performs_s3_operations()
# application logic here
except BaseException as be:
log.exception(be)
return
def processor(base_queue, some_queue):
while True: # loop to catch all items
time.sleep(0.05) # poor man's nice
if not base_queue.empty():
try:
# application logic here
method2_that_performs_s3_operations()
add_to_some_queue()
# application logic here
except BaseException as be:
log.exception(be)
return
def collector(base_queue):
while True: # loop to catch all items
time.sleep(0.05) # poor man's nice
if not base_queue.full():
try:
# application logic here
add_to_base_queue()
# application logic here
except BaseException as be:
log.exception(be)
return
def main():
base_queue = Queue(DEFAULT_QUEUE_SIZE)
some_queue = Queue(DEFAULT_QUEUE_SIZE * 2)
# define and run threads
thread_collector = threading.Thread(target=collector, name='thread_collector',
args=(base_queue))
thread_processor = threading.Thread(target=processor, name='thread_processor',
args=(base_queue, some_queue))
thread_finalizer = threading.Thread(target=finalizer, name='thread_finalizer',
args=(some_queue))
# wait specific time to start processing threads
time.sleep(30.0)
thread_collector.start()
thread_processor.start()
thread_finalizer.start()
return
if __name__ == '__main__':
main()
Expected behavior S3 operations will proceed successfully to download/upload without any custom configuration. Exceptions relating to concurrency inside s3 code will not be thrown.
Debug logs
Full stack trace by adding boto3.set_stream_logger('')
to your code.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:5 (2 by maintainers)
Top GitHub Comments
Hi @jpl-jengelke, thanks for reaching out. I brought this up with the team and it is something that we’re looking into further. We will let you know when we have an update.
@tim-finnigan Any update on this?