Very slow boto3.client.put_object
See original GitHub issueWe’re seeing an extremely puzzling issue: one of two machines, which are running identical code and nearly identical in configuration, exhibits wildly slower boto3.client(‘s3’).put_object performance than the other machine (note: we only instantiate the client once per thread/process). Using boto3 and running multiple processes, Machine #2 transfers data at around 1.5Gbps while Machine #1 tranfers data at around 0.015Gbps.
The machine configurations are slightly different (mostly they have differing sets of network monitoring tools), so that’s suspicious, but we’ve confirmed that uploading using the awscli tool runs at roughly 1Gbps on either machine. So Machine #1 and #2’s network setups are fine.
Checking on raw boto3, we started up a fresh Python REPL and did a minimal test of boto3.client.put_object and saw the same very low performance on Machine #1.
We switched our upload script on Machine #2 from using boto3 to subprocess-calling awscli and Machine #2’s performance headed towards Machine #1’s (after accounting for the shelling-out-to-a-fresh-interpreter’s effect on Amdahl’s Law).
So we’ve ruled out all of the cases we can think of to explain the slowness of boto3.client.put_object on Machine #1 and are left with only boto3.client.put_object as the culprit. An additional strange characteristic of the slowness is that, using ‘bmon’, we’re able to watch traffic on the interface slowly ramp up [exponentially?] until the file is completely uploaded (which can take up to a minute). Additionally, CPU sys % sits around 10% on Machine #1, which is similar to Machine #2 and indicates significant network activity (even though traffic is low).
Our usage of boto3 is basically (where data can be a 100MB MP4):
s3_client = boto3.client('s3')
conn = boto.connect_s3()
bucket = Bucket(BUCKET_NAME)
def upload(key, data):
s3_client.put_object(Bucket=bucket.bucket_name,
StorageClass='REDUCED_REDUNDANCY',
Key=key,
Body=data,
Metadata={ 'source': args.source })
We’ve run out of ideas for diagnostics. Do you have any pointers for us or any ideas as to the failure mode we’re seeing?
Issue Analytics
- State:
- Created 8 years ago
- Comments:14 (5 by maintainers)
Top GitHub Comments
@keven425 To be honest, I have little recollection of the issue (it’s been nearly three years). The issue was in a cluster of 8 nearly identical Debian Jessie Dell R720xd machines which were uploading 10 minute videos to AWS S3 (of mice & rats; see https://vium.com). The machines were directly connected, via a router, to AWS Direct Connect over a 10gbps optical fiber.
Agree with @jamesls. And this is the documentation for the s3_client.upload_file(). It accepts a filename, and it will automatically split the big file into multiple chunks with default size as 8MB and default concurrency of 10, and each chunk is streaming through the aforementioned low level APIs. This will generally give you a much better throughput than a single thread put_object(). Please let us know whether it makes a difference.