question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 Multipart upload using threads

See original GitHub issue

Hi there, It would be great to have an ability to upload chunks to s3 with ThreadPoolExecutor, in other words to send each chunk in a separate thread to make uploading faster. This is how we do in my current project. I think it’s better to include this functionality to smart_open. wdyt? let me know if it’s interesting and I’ll prepare a pull request.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
piskvorkycommented, Oct 17, 2019

But threads won’t help you in I/O bounds operations. They add more CPU, which is not the bottleneck (I think).

I appreciate your offer, but as part of your PR, we’d need to see convincing benchmarks first. Can you post some concrete numbers?

Threads always add a lot of complexity and maintenance and support headache, so unless the gains are obvious, I’m -1 on complicating the code.

0reactions
mpenkovcommented, Oct 18, 2019

@vryazanov Thank you for your interest in smart_open. Personally, I don’t think it’s worth including this functionality in smart_open, for several reasons.

First, there are already tools that handle that use case extremely well (for example, the AWS CLI).

Second, there’s a fair bit of effort to implement multi-threaded upload without breaking the existing abstraction of an upload being a write to a file stream. I’m not sure the benefits are worth the cost.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Amazon S3 Multipart UploadPartRequest allows only single ...
I am trying to upload video files Amazon S3 using Multipart upload method in asp.net and I traced the upload progress using logs....
Read more >
Uploading and copying objects using multipart upload
Multipart upload is a three-step process: You initiate the upload, you upload the object parts, and after you have uploaded all the parts,...
Read more >
Multithreaded multipart uploader for s3 - gists · GitHub
Multithreaded multipart uploader for s3. GitHub Gist: instantly share code, notes, and snippets.
Read more >
Multipart Uploads in Amazon S3 with Java - Baeldung
In this tutorial, we'll see how to handle multipart uploads in Amazon S3 with AWS Java SDK. Simply put, in a multipart upload, ......
Read more >
Python Boto3 S3 multipart upload in multiple threads doesn't ...
The main reason for using multipart upload is to better utilise the available bandwidth that you have because (in general, and I'm skipping...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found