AWS S3 Multipart Upload - Performance Issues when uploading several files at once.
See original GitHub issueHello,
I noticed that when uploading several files at once, uppy-io seems to not call the completion end point immediately after a file’s parts have been completely uploaded. It is giving precedence to uploading more parts than essentially marking the files done by calling the completion end point. The issue becomes more pronounced when uploading more than 100 files at once.
I uploaded 100 files about 5.5MB each as a test. The Dashboard completion status showed Uploading 100%, yet the # of files completed only showed 30 of 100 completed. That’s because the completion calls seem to have low priority or are being executed after nearly all the parts in the entire batch upload have completed.
This is confusing because the user sees that Uploading is 100% done - when it isn’t from a user’s stand point. Also depending on how large the files are it can take a very long time before a single file has actually had it’s completion end point called. This is a problem because additional backend processing can not occur on completed files until nearly the entire batch upload has completed. ie, I also did a batch run of 465 files (3GB+ total) that took over 30 minutes to upload all the files. Uploading percentage increased during the time of upload, but 0 of 465 files were completed until near the very end of the upload process.
I also watch the network tab in my browser while these files are being processed. And I can see that a slew of calls to the complete (/{upload-id}/complete?key={file-key}) end point occur after nearly all the file parts from every file in the batch upload finishes.
I’m requesting that after each file has had all of it’s parts uploaded, the completion end point for the file should be called before the next file’s parts start processing/uploading.
My simple set up: uppy-io v1.0.0 (from CDN - https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.js)
Using the Core, Dashboard, & AWS S3 Multipart plug ins. I use my own back end (not companion) that is compatible with the companion multi part upload end points.
index.html:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Uppy</title>
<link href="https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.css" rel="stylesheet">
</head>
<body>
<div id="drag-drop-area"></div>
<script src="https://transloadit.edgly.net/releases/uppy/v1.0.0/uppy.min.js"></script>
<script src="index.js"></script>
</body>
</html>
index.js:
const AwsS3 = Uppy.AwsS3,
AwsS3Multipart = Uppy.AwsS3Multipart;
const uppy = Uppy.Core()
.use(Uppy.Dashboard, {
height: 600,
width: "100%",
inline: true,
disableThumbnailGenerator: true,
showLinkToFileUploadResult: false,
showProgressDetails: true,
target: "#drag-drop-area"
})
.use(AwsS3Multipart, {
limit: 4,
companionUrl: "http://localhost:3001/api/storage-request/"
})
.on("complete", (result) => {
console.log(result);
console.log("Upload complete! We’ve uploaded these files:", result.successful);
});
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (5 by maintainers)
The issue seems to be, that any uploads will complete only after all uploads have finished the
createMultipartUpload
stage.I set up some good old console.logging to see how the flow goes with 150 large files (~5Mb each)
Flow currently
createMultipartUpload
for each filecreateMultipartUpload
-calls returncompleteMultipartUpload
on any files, until all files have succesfully return theircreateMultipartUpload
calls.Better flow would be
createMultipartUpload
get calledcompleteMultipartUpload
as soon as they finish@goto-bus-stop any thoughts on the suggested flow or ideas how to tackle this best? I’d love to get this fixed, as it’s causing
a) our servers getting swamped with possibly hundreds or thousands of upload requests, way ahead of time before the uploads even begin. If a user decides to close the browser before uploads finish, we end up with lots of unnecessary documents in our database.
b) it’s very confusing for the user that the files don’t get completed in the user interface, even when they have apparently been 100% loaded
c) having the uploads not completed until after all uploads have returned their
createMultipartUpload
calls increases the risk of the uploads never being completed, even while they have been 100% uploaded (let’s say the user’s network dies just before the last callcreateMultipartUpload
returns: user has already uploaded 10-50 files, but none of them have been assembled by callingcompleteMultipartUpload
)Thanks for the detailed writeups, I’ll look into this stuff ASAP (half the team is on vacay and i’m moving apartments so things have been a bit slow on our end, sorry!)