question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AWS S3 Multipart Upload - Performance Issues when uploading several files at once.

See original GitHub issue

Hello,

I noticed that when uploading several files at once, uppy-io seems to not call the completion end point immediately after a file’s parts have been completely uploaded. It is giving precedence to uploading more parts than essentially marking the files done by calling the completion end point. The issue becomes more pronounced when uploading more than 100 files at once.

I uploaded 100 files about 5.5MB each as a test. The Dashboard completion status showed Uploading 100%, yet the # of files completed only showed 30 of 100 completed. That’s because the completion calls seem to have low priority or are being executed after nearly all the parts in the entire batch upload have completed.

This is confusing because the user sees that Uploading is 100% done - when it isn’t from a user’s stand point. Also depending on how large the files are it can take a very long time before a single file has actually had it’s completion end point called. This is a problem because additional backend processing can not occur on completed files until nearly the entire batch upload has completed. ie, I also did a batch run of 465 files (3GB+ total) that took over 30 minutes to upload all the files. Uploading percentage increased during the time of upload, but 0 of 465 files were completed until near the very end of the upload process.

I also watch the network tab in my browser while these files are being processed. And I can see that a slew of calls to the complete (/{upload-id}/complete?key={file-key}) end point occur after nearly all the file parts from every file in the batch upload finishes.

I’m requesting that after each file has had all of it’s parts uploaded, the completion end point for the file should be called before the next file’s parts start processing/uploading.

My simple set up: uppy-io v1.0.0 (from CDN - https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.js)

Using the Core, Dashboard, & AWS S3 Multipart plug ins. I use my own back end (not companion) that is compatible with the companion multi part upload end points.

index.html:

<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <title>Uppy</title>
    <link href="https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.css" rel="stylesheet">
</head>
<body>
<div id="drag-drop-area"></div>

<script src="https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.js"></script>
<script src="index.js"></script>
</body>
</html>

index.js:

const AwsS3 = Uppy.AwsS3,
    AwsS3Multipart = Uppy.AwsS3Multipart;

const uppy = Uppy.Core()
    .use(Uppy.Dashboard, {
        height: 600,
        width: "100%",
        inline: true,
        disableThumbnailGenerator: true,
        showLinkToFileUploadResult: false,
        showProgressDetails: true,
        target: "#drag-drop-area"
    })
    .use(AwsS3Multipart, {
        limit: 4,
        companionUrl: "http://localhost:3001/api/storage-request/"
    })
    .on("complete", (result) => {
        console.log(result);
        console.log("Upload complete! We’ve uploaded these files:", result.successful);
    });

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
argghcommented, Jul 22, 2019

The issue seems to be, that any uploads will complete only after all uploads have finished the createMultipartUpload stage.

I set up some good old console.logging to see how the flow goes with 150 large files (~5Mb each)

Flow currently

  1. Uppy will call createMultipartUpload for each file
  2. Uppy will start uploading files as soon as the createMultipartUpload-calls return
  3. Uppy will not however call completeMultipartUpload on any files, until all files have succesfully return their createMultipartUpload calls.

Better flow would be

  1. Uppy will throttle uploads to the limited amount of simultaneous uploads, having (for example) 2-4 uploads at any given moment
  2. Only once an upload is actually going to be started, will the createMultipartUploadget called
  3. Uploads will call completeMultipartUpload as soon as they finish

@goto-bus-stop any thoughts on the suggested flow or ideas how to tackle this best? I’d love to get this fixed, as it’s causing

a) our servers getting swamped with possibly hundreds or thousands of upload requests, way ahead of time before the uploads even begin. If a user decides to close the browser before uploads finish, we end up with lots of unnecessary documents in our database.

b) it’s very confusing for the user that the files don’t get completed in the user interface, even when they have apparently been 100% loaded

c) having the uploads not completed until after all uploads have returned their createMultipartUpload calls increases the risk of the uploads never being completed, even while they have been 100% uploaded (let’s say the user’s network dies just before the last call createMultipartUpload returns: user has already uploaded 10-50 files, but none of them have been assembled by calling completeMultipartUpload)

1reaction
goto-bus-stopcommented, May 13, 2019

Thanks for the detailed writeups, I’ll look into this stuff ASAP (half the team is on vacay and i’m moving apartments so things have been a bit slow on our end, sorry!)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Uploading and copying objects using multipart upload
An in-progress multipart upload is an upload that you have initiated, but have not yet completed or stopped. Each request returns at most...
Read more >
Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file (1 GB or larger) to Amazon Simple Storage Service (Amazon S3) using the console. However, the...
Read more >
Is it possible to perform a batch upload to amazon s3?
You can use this method to batch upload files to S3 very fast. ... Does the s3 API support uploading multiple objects in...
Read more >
How to Upload Large Files to AWS S3 - Medium
Finally, Multipart upload is a useful utility to make the file one object in S3 instead of uploading it as multiple objects (each...
Read more >
AWS S3 Multipart Uploads — Avoiding Hidden Costs from ...
When uploading a >5Mb file to an AWS S3 bucket, the AWS SDK/CLI automatically splits the upload to multiple HTTP PUT requests.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found