Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hang in s3.download_file with Celery worker in version 1.4.0

See original GitHub issue

I’ve been using lots of boto3 calls in my Flask app for some time, but the switch to the latest boto3 v1.4.0 has broken my Celery workers. Something that may be unique about my app is that I use S3 to download a secure environment variables file before launching my app or workers. It appears that the new boto3 works with my app, but hangs when launching the Celery worker.

I would temporarily downgrade my boto3 to avoid the problem, but its been a long time since the last release, and I need the elbv2 support that only comes in 1.4.0.

I’ve created a tiny version of my worker (worker2.py) to demonstrate the problem. I’ve verified that using the previous version boto3 1.3.1 results in the worker launching properly. I see all prints and the Celery worker banner output.

If I install boto3 1.4.0, then the second print() statement “Download complete” is never reached. Also note that I tried following the new doc example with boto3.resource and using s3.meta.client, but that fails as well.

#
# Stub Celery worker to demonstrate bug in Boto3 1.4.0. Works fine with previous version Boto3 1.3.1.
# Test with: celery worker -A worker2.celery
#
from flask import Flask
from celery import Celery
import boto3
import tempfile

celery = Celery(__name__, broker='amqp://guest:guest@localhost:5672//')

app = Flask(__name__)

s3 = boto3.client('s3', region_name='us-west-1')
env_file = 'APPNAME.APPSTAGE.env'
with tempfile.NamedTemporaryFile() as s3_file:
    print("Downloading file...")
    response = s3.download_file('APPBUCKET', env_file, s3_file.name)
    print("Download complete!")

You can test it by running the following at the command line:

celery worker -A worker2.celery

Also note that just running the code downloads the file just fine with 1.4.0:

python worker2.py

Issue Analytics

State:
Created 7 years ago
Comments:30 (7 by maintainers)

Top GitHub Comments

1reaction

jameslscommented, Oct 6, 2016

@ask, the change is that download_file() (which is a method to download an object from S3) is now multithreaded. The client creation/initialization that happens when creating a client via boto3.client() does not use threads.

More nuanced is that in previous versions of boto3, we had a conditional in boto3 that was roughly:

def download_file(self, ...): # downloads a file from S3
    file_size = get_file_size_from_s3()
    if file_size < 8MB:
        download_in_this_thread_in_one_api_call()
    else:
        use a concurrent.futures.ThreadPoolExecutor() and download the file chunks in parallel

This meant that in previous versions of boto3, if you stayed under 8MB, downloading a file would never spin up threads. However, above 8MB and it seems like you’d still run into this problem in olde versions of boto3.

In the latest version of boto3, downloading a file from S3 via the download_file() method will always use threads.

We’re still investigating if there’s anything we can do on our end to improve this.

Hope that gives more context into what’s going on.

1reaction

kyleknapcommented, Oct 5, 2016

I am able to reproduce it. I am looking into why it may be happening.

Top Results From Across the Web

Python downloading/zipping files from S3: ECS Fargate hits ...

I can download them one by one by doing this: aws s3 sync s3://mybucketname . But I have been having trouble zipping them...

Package List — Spack 0.20.0.dev0 documentation

This is a list of things you can install using Spack. It is automatically generated based on the packages in this Spack version....

Bug listing with status RESOLVED with resolution OBSOLETE ...

Bug:1523 - "[IDEA] Offload work by distributing trivial ebuild maintenance to users, introduce a simple stability voting system and have a core team...

Changelog — Airflow Documentation

Fix slow (cleared) tasks being be adopted by Celery worker. (#16718). Fix calculating duration in tree ... Update download url for Airflow Version...

Bioconductor 3.14 Released

There are 89 new software packages in this release of Bioconductor. atena Quantify expression of transposable elements (TEs) from RNA-seq data ...