question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very slow boto3.client.put_object

See original GitHub issue

We’re seeing an extremely puzzling issue: one of two machines, which are running identical code and nearly identical in configuration, exhibits wildly slower boto3.client(‘s3’).put_object performance than the other machine (note: we only instantiate the client once per thread/process). Using boto3 and running multiple processes, Machine #2 transfers data at around 1.5Gbps while Machine #1 tranfers data at around 0.015Gbps.

The machine configurations are slightly different (mostly they have differing sets of network monitoring tools), so that’s suspicious, but we’ve confirmed that uploading using the awscli tool runs at roughly 1Gbps on either machine. So Machine #1 and #2’s network setups are fine.

Checking on raw boto3, we started up a fresh Python REPL and did a minimal test of boto3.client.put_object and saw the same very low performance on Machine #1.

We switched our upload script on Machine #2 from using boto3 to subprocess-calling awscli and Machine #2’s performance headed towards Machine #1’s (after accounting for the shelling-out-to-a-fresh-interpreter’s effect on Amdahl’s Law).

So we’ve ruled out all of the cases we can think of to explain the slowness of boto3.client.put_object on Machine #1 and are left with only boto3.client.put_object as the culprit. An additional strange characteristic of the slowness is that, using ‘bmon’, we’re able to watch traffic on the interface slowly ramp up [exponentially?] until the file is completely uploaded (which can take up to a minute). Additionally, CPU sys % sits around 10% on Machine #1, which is similar to Machine #2 and indicates significant network activity (even though traffic is low).

Our usage of boto3 is basically (where data can be a 100MB MP4):

s3_client = boto3.client('s3')
conn = boto.connect_s3()
bucket = Bucket(BUCKET_NAME)

def upload(key, data):
    s3_client.put_object(Bucket=bucket.bucket_name,
                         StorageClass='REDUCED_REDUNDANCY',
                         Key=key,
                         Body=data,
                         Metadata={ 'source': args.source })

We’ve run out of ideas for diagnostics. Do you have any pointers for us or any ideas as to the failure mode we’re seeing?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
alsonkempcommented, Jul 9, 2018

@keven425 To be honest, I have little recollection of the issue (it’s been nearly three years). The issue was in a cluster of 8 nearly identical Debian Jessie Dell R720xd machines which were uploading 10 minute videos to AWS S3 (of mice & rats; see https://vium.com). The machines were directly connected, via a router, to AWS Direct Connect over a 10gbps optical fiber.

1reaction
rayluocommented, Dec 17, 2015

Agree with @jamesls. And this is the documentation for the s3_client.upload_file(). It accepts a filename, and it will automatically split the big file into multiple chunks with default size as 8MB and default concurrency of 10, and each chunk is streaming through the aforementioned low level APIs. This will generally give you a much better throughput than a single thread put_object(). Please let us know whether it makes a difference.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Boto3 put_object() is very slow - python - Stack Overflow
TL;DR: Trying to put .json files into S3 bucket using Boto3, process is very slow. Looking for ways to speed it up.
Read more >
boto3 put_object() is very slow
boto3 put_object() is very slow ... s3.Bucket('my_bucket').put_object(Key = key, Body = data) import boto3 client = boto3.client('s3')
Read more >
S3 — Boto3 Docs 1.26.34 documentation - AWS
This action aborts a multipart upload. After a multipart upload is aborted, no additional parts can be uploaded using that upload ID. The...
Read more >
Troubleshoot slow or inconsistent speeds when downloading ...
How can I troubleshoot slow or inconsistent speeds when downloading or uploading to Amazon Simple Storage Service (Amazon S3)? When I download from...
Read more >
Downloading files from S3 with multithreading and Boto3
Boto3 in a nutshell: clients, sessions and resources; Is Boto3 ... Doing that ends up being quite slow, since creating a session has...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found