Hang in s3.download_file with Celery worker in version 1.4.0
See original GitHub issueI’ve been using lots of boto3 calls in my Flask app for some time, but the switch to the latest boto3 v1.4.0 has broken my Celery workers. Something that may be unique about my app is that I use S3 to download a secure environment variables file before launching my app or workers. It appears that the new boto3 works with my app, but hangs when launching the Celery worker.
I would temporarily downgrade my boto3 to avoid the problem, but its been a long time since the last release, and I need the elbv2 support that only comes in 1.4.0.
I’ve created a tiny version of my worker (worker2.py) to demonstrate the problem. I’ve verified that using the previous version boto3 1.3.1 results in the worker launching properly. I see all prints and the Celery worker banner output.
If I install boto3 1.4.0, then the second print() statement “Download complete” is never reached. Also note that I tried following the new doc example with boto3.resource and using s3.meta.client, but that fails as well.
#
# Stub Celery worker to demonstrate bug in Boto3 1.4.0. Works fine with previous version Boto3 1.3.1.
# Test with: celery worker -A worker2.celery
#
from flask import Flask
from celery import Celery
import boto3
import tempfile
celery = Celery(__name__, broker='amqp://guest:guest@localhost:5672//')
app = Flask(__name__)
s3 = boto3.client('s3', region_name='us-west-1')
env_file = 'APPNAME.APPSTAGE.env'
with tempfile.NamedTemporaryFile() as s3_file:
print("Downloading file...")
response = s3.download_file('APPBUCKET', env_file, s3_file.name)
print("Download complete!")
You can test it by running the following at the command line:
celery worker -A worker2.celery
Also note that just running the code downloads the file just fine with 1.4.0:
python worker2.py
Issue Analytics
- State:
- Created 7 years ago
- Comments:30 (7 by maintainers)
Top GitHub Comments
@ask, the change is that
download_file()
(which is a method to download an object from S3) is now multithreaded. The client creation/initialization that happens when creating a client viaboto3.client()
does not use threads.More nuanced is that in previous versions of boto3, we had a conditional in boto3 that was roughly:
This meant that in previous versions of boto3, if you stayed under 8MB, downloading a file would never spin up threads. However, above 8MB and it seems like you’d still run into this problem in olde versions of boto3.
In the latest version of boto3, downloading a file from S3 via the
download_file()
method will always use threads.We’re still investigating if there’s anything we can do on our end to improve this.
Hope that gives more context into what’s going on.
I am able to reproduce it. I am looking into why it may be happening.