Connections to S3 much slower than with boto2?
See original GitHub issueI have developed a web application with boto (v2.36.0) and am trying to migrate it to use boto3 (v1.1.3). Because the application is deployed on a multi-threaded server, I connect to S3 for each HTTP request/response interaction.
Following the guidance here, I am generating a session per request to ensure thread-safety. However, I have noticed that whereas boto2 takes approximately 0.2 ms to connect to S3, boto3 takes approximately 25 ms.
Here is an IPython session that demonstrates the issue. This was run in IPython 3.0.0 on Python 3.4.3 (Anaconda 2.2.0, 64-bit) on Ubuntu 14.04 LTS on an AWS EC2 instance in us-east-1; however, I have noted similar behavior in Python 2.7 on my local machine.
import boto as boto2
import boto3
access_key_id = ...
secret_access_key = ...
bucket_name = ...
contents = chr(0)
def connect_with_boto2():
connection = boto2.connect_s3(access_key_id, secret_access_key)
return connection
def connect_with_boto3():
session = boto3.session.Session(
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
)
connection = session.resource('s3')
return connection
def set_with_boto3():
connection = connect_with_boto3()
bucket = connection.Bucket(bucket_name)
bucket.Object('boto3').put(Body=contents)
def set_with_boto2():
connection = connect_with_boto2()
bucket = connection.get_bucket(bucket_name)
key = boto2.s3.key.Key(bucket, 'boto2')
key.set_contents_from_string(contents)
%timeit set_with_boto2()
#10 loops, best of 3: 48.9 ms per loop
%timeit set_with_boto3()
#10 loops, best of 3: 73.1 ms per loop
%timeit connect_with_boto2()
#1000 loops, best of 3: 203 µs per loop
%timeit connect_with_boto3()
#10 loops, best of 3: 26.7 ms per loop
Am I setting up these connections correctly, or am I comparing apples-to-oranges? If the latter, is there a way to get the boto3 performance to approximate boto2?
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
@kyleknap I believe there are a few more items to consider here
Therefore, recommending to create more instances on all multi-threaded use cases isn’t a good advice. It’s probably better to make Session and Resource immutable and threadsafe, and let them manage connection pool sizing etc and by that better support all multithreaded scenarios - including short lived threads used for parallelisation of I/O, and get a good performance gain while at it.
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it.