question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connections to S3 much slower than with boto2?

See original GitHub issue

I have developed a web application with boto (v2.36.0) and am trying to migrate it to use boto3 (v1.1.3). Because the application is deployed on a multi-threaded server, I connect to S3 for each HTTP request/response interaction.

Following the guidance here, I am generating a session per request to ensure thread-safety. However, I have noticed that whereas boto2 takes approximately 0.2 ms to connect to S3, boto3 takes approximately 25 ms.

Here is an IPython session that demonstrates the issue. This was run in IPython 3.0.0 on Python 3.4.3 (Anaconda 2.2.0, 64-bit) on Ubuntu 14.04 LTS on an AWS EC2 instance in us-east-1; however, I have noted similar behavior in Python 2.7 on my local machine.

import boto as boto2
import boto3

access_key_id = ...
secret_access_key = ...
bucket_name = ...
contents = chr(0)

def connect_with_boto2():
    connection = boto2.connect_s3(access_key_id, secret_access_key)
    return connection

def connect_with_boto3():
    session = boto3.session.Session(
        aws_access_key_id=access_key_id, 
        aws_secret_access_key=secret_access_key,
    )
    connection = session.resource('s3')
    return connection

def set_with_boto3():
    connection = connect_with_boto3()
    bucket = connection.Bucket(bucket_name)
    bucket.Object('boto3').put(Body=contents)

def set_with_boto2():
    connection = connect_with_boto2()
    bucket = connection.get_bucket(bucket_name)
    key = boto2.s3.key.Key(bucket, 'boto2')
    key.set_contents_from_string(contents)

%timeit set_with_boto2()
#10 loops, best of 3: 48.9 ms per loop

%timeit set_with_boto3()
#10 loops, best of 3: 73.1 ms per loop

%timeit connect_with_boto2()
#1000 loops, best of 3: 203 µs per loop

%timeit connect_with_boto3()
#10 loops, best of 3: 26.7 ms per loop

Am I setting up these connections correctly, or am I comparing apples-to-oranges? If the latter, is there a way to get the boto3 performance to approximate boto2?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

7reactions
synhershkocommented, Mar 22, 2016

@kyleknap I believe there are a few more items to consider here

  1. Multithreading doesn’t necessarily imply long lived threads. Use of multithreading for parallel download of files for S3 using eventlet for example is common in Python, and for that use case it actually makes sense to create the session and resource once and reuse it from the short lived threads.
  2. In addition to 1 above, but not only for that use case, there are many resources that are initialised and could be reused. I.e. the urllib3 error that is thrown often when trying to parallelise S3 reads with boto3 - “Connection pool is full, discarding connection” which points to the fact that multi-threaded usage of a shared session could be beneficial for reasons of sharing internal connection pools (see http://stackoverflow.com/a/18845952/135701, https://github.com/openstack/python-swiftclient/commit/19d7e1812a99d73785146667ae2f3a7156f06898 for example of possible solutions).

Therefore, recommending to create more instances on all multi-threaded use cases isn’t a good advice. It’s probably better to make Session and Resource immutable and threadsafe, and let them manage connection pool sizing etc and by that better support all multithreaded scenarios - including short lived threads used for parallelisation of I/O, and get a good performance gain while at it.

0reactions
github-actions[bot]commented, Mar 2, 2021

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Boto 3 is much slower than boto 2 for reading many small ...
The script uses 20 threads to read 1,000 objects from an S3 bucket. It takes 5 - 6 seconds using boto2, but 18...
Read more >
Troubleshoot slow or inconsistent speeds when downloading ...
FirstByteLatency shows how long it takes for Amazon S3 to process the request from the client and then begin sending the response to...
Read more >
Migrating from Boto 2.x — Boto3 Docs 1.26.32 documentation
The lower-level is comparable to Boto 2.x layer 1 connection objects in that they provide a one to one mapping of API operations...
Read more >
S3 — boto v2.49.0
Returns an connection object pointing to the endpoint associated with this region. ... A lower-level method for listing contents of a bucket.
Read more >
Reading/writing SMALL files are SUPER slow on things like ...
If you look at benchmarks on internet, yes, it will look like S3 is dead slow. ... From my experience and benchmarks, each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found