question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Expose max_bandwidth configuration from s3transfer

See original GitHub issue

This is a feature request to expose the new max_bandwith config option from s3transfer. It should be fairly trivial to expose the config option in boto3.s3.transfer.TransferConfig now that s3transfer supports it (as of 0.1.12)

Related issues + pull requests:

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:12
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
korbinianbauercommented, Oct 19, 2021

As an alternative to tong-bluehill’s workaround, I’m using my own minimal TransferConfig class, which includes the missing max_bandwidth key. (Implementation in ▸ spoiler below, based on s3transfer.manager.TransferConfig).

This can then be passed to .upload_fileobj() or .upload_file() as per usual.

Open here for MyTransferConfig class:
from s3transfer.constants import KB, MB

class MyTransferConfig(object):
    def __init__(self,
                 multipart_threshold=8 * MB,
                 multipart_chunksize=8 * MB,
                 max_request_concurrency=10,
                 max_submission_concurrency=5,
                 max_request_queue_size=1000,
                 max_submission_queue_size=1000,
                 max_io_queue_size=1000,
                 io_chunksize=256 * KB,
                 num_download_attempts=5,
                 max_in_memory_upload_chunks=10,
                 max_in_memory_download_chunks=10,
                 max_bandwidth=None, # <--------------------------------------------------
                 use_threads=True):

        self.multipart_threshold = multipart_threshold
        self.multipart_chunksize = multipart_chunksize
        self.max_request_concurrency = max_request_concurrency
        self.max_submission_concurrency = max_submission_concurrency
        self.max_request_queue_size = max_request_queue_size
        self.max_submission_queue_size = max_submission_queue_size
        self.max_io_queue_size = max_io_queue_size
        self.io_chunksize = io_chunksize
        self.num_download_attempts = num_download_attempts
        self.max_in_memory_upload_chunks = max_in_memory_upload_chunks
        self.max_in_memory_download_chunks = max_in_memory_download_chunks
        self.max_bandwidth = max_bandwidth # <--------------------------------------------
        self.use_threads = use_threads
        self._validate_attrs_are_nonzero()

    def _validate_attrs_are_nonzero(self):
        for attr, attr_val, in self.__dict__.items():
            if attr_val is not None and attr_val <= 0:
                raise ValueError(
                    'Provided parameter %s of value %s must be greater than '
                    '0.' % (attr, attr_val))

It’s more LOC initially for the class, but then it’s just adding the Config key to the client’s upload method call and that’s it.

Usage:

upload_config = MyTransferConfig(max_bandwidth=1024**2) # 1MiB/s

s3_client.upload_file(path, bucket_name, object_key, Config=upload_config)

### or

with open(path, "rb") as f:
    s3_client.upload_fileobj(f, bucket_name, object_key, Config=upload_config)

Result of uploading ~40MiB worth of files with a 1 MiB/s bandwidth limit:

1MiBps upload limit

I noticed that for low bandwidths like 100 KiB/s, the upload speeds are quite “bursty”: 100KiB per second

I could stabilize this behaviour by changing the default value of parameter bytes_threshold of class BandwidthLimitedStream in bandwidth.py from the default 256 * 1024 to a lower value. e.g. 8 * 1024. However, I didn’t investigate the side-effects of doing so, so feel free to propose a better solution. I don’t know if this for example causes any serious overheads on disk I/O or network. 100KiB_optimized

I also wrote a small method to apply this patch automatically:

def patch_s3transfer_upload_chunksize():
    import s3transfer
    bandwidth_module_file = os.path.dirname(s3transfer.__file__) + "/bandwidth.py"
    patch_cmd = "sed -i 's/bytes_threshold=256 \* 1024/bytes_threshold=8 \* 1024/g' " + bandwidth_module_file
    os.system(patch_cmd)
1reaction
tong-bluehillcommented, Jul 16, 2021

Since the max_bandwidth is supported in boto/s3transfer but not in boto/boto3, the workaround is using s3transfer to upload with a limit. Paste my code here.

import boto3
s3_client=boto3.client('s3')

import s3transfer
from s3transfer.manager import TransferConfig, TransferManager
tc = TransferConfig(max_bandwidth=100*1000)
tm = TransferManager(s3_client, tc)
tf = tm.upload('/my/file', 'mys3bucket', 'mys3key')
print(tf.result())

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - Expose max_bandwidth configuration from s3transfer -
This is a feature request to expose the new max_bandwith config option from s3transfer . It should be fairly trivial to expose the...
Read more >
File transfer configuration — Boto3 Docs 1.26.32 documentation
Configuration settings are stored in a boto3.s3.transfer.TransferConfig object. The object is passed to a transfer method ( upload_file , download_file , ...
Read more >
Bug List - FreeBSD Bugzilla
This list is too long for Bugzilla's little mind; the Next/Prev/First/Last buttons won't appear on individual bugs. ; 260964, Ports & Packages, Individual...
Read more >
s3transfer - PyPI
An Amazon S3 Transfer Manager. ... from minor version to minor version. For a basic, stable interface of s3transfer, try the interfaces exposed...
Read more >
Questions for Amazon Web Services - YoYoBrain.com.
AWS: one CloudFront distribution exposes the content of ______ ... AWS: describe the AS launch configurations ... AWS: Amazon S3 Transfer Acceleration.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found