question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

lib-storage Upload does not retry if one part fails.

See original GitHub issue

Describe the bug

I’m using Upload from lib-storage to get advantage of multipart uploads regarding large files. I would expect it to retry in case any of failure uploading any part. Imagine we have a 5TB file that is splitted in 5MB parts. That would result in many, many parts, so the chance of individual failure is high.

There is the leavePartsOnError option. Setting it to true interrupts the whole upload if any part fails.

Your environment

SDK version number

@aws-sdk/lib-storage@3.13.1

Is the issue in the browser/Node.js/ReactNative?

Browser

Details of the browser/Node.js/ReactNative version

  Browsers:
    Chrome: 90.0.4430.85
    Firefox: 78.0.2
    Safari: 14.0.3

Steps to reproduce

Upload a file using Upload, exactly like in the example:

       try {
          const upload = new Upload({
            client: s3,
            params: uploadParams,
            leavePartsOnError: false
          })

          upload.on('httpUploadProgress', progress => {
            console.log(progress)
          })

          const uploadResult = await upload.done()
          console.log('done uploading', uploadResult)
        }
        catch (err) {
          showErrors('There was an error uploading your photo: ', err)
        }

Observed behavior

If there are no errors with any part, the upload goes right. If there is an error in one or more parts, that part is skipped and the final file is built without those parts. This results in a corrupt file. This can be reproduced stopping the connection (turning WiFi off) or using browser DevTools to stop the connection.

Expected behavior

If a single part fails to upload, what is likely to happen with big files in mobile environments, just retry to send that part. This could be done for example here: https://github.com/aws/aws-sdk-js-v3/blob/28008c2a3e30cd234f447484d76777d5db56dad1/lib/lib-storage/src/Upload.ts#L124-L130

Do not send the final build command if any part is missing: https://github.com/aws/aws-sdk-js-v3/blob/28008c2a3e30cd234f447484d76777d5db56dad1/lib/lib-storage/src/Upload.ts#L152-L162

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:5
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

9reactions
mificommented, Oct 10, 2021

If I understand it correctly, the Upload module is designed so that (with the default behavior leavePartsOnError=false) if one part fails to upload, the whole upload will finish but without that part, creating a corrupt file and the promise will resolve with success. To me this seems like a very scary default behavior, and I would think that leavePartsOnError=true would be a safer default, or am I missing something?

Update: If I use leavePartsOnError: true, it actually retries before failing with @aws-sdk/client-s3 3.36.0 when using the S3-client from @aws-sdk/client-s3, due to the built in retry strategy in the client.

5reactions
mkrncommented, Sep 14, 2021

Retry uploads of failed multiparts apparently was a feature of AWS SDK v2. Now with lib-storage it doesn’t have it, so it’s a regression.

Also, here’s what AWS recommends:

Enable a retry mechanism in the application making requests

Because of the distributed nature of Amazon S3, requests that return 500 or 503 errors can be retried. It's a best practice to build retry logic into applications that make requests to Amazon S3. We recommend that you set the retry count for your applications to 10.

All AWS SDKs have a built-in retry mechanism with an algorithm that uses exponential backoff. This algorithm implements increasingly longer wait times between retries for consecutive error responses. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. For more information, see Error Retries and Exponential Backoff in AWS.

There are 500 errors from time to time when uploading multi-part files and currently, none of JS AWS maintained libraries handle them properly. Quick search shows how common and frustrating those are with no solution offered: https://www.google.com/search?q=s3+multipart+upload+error+500

@alvaromat I would argue not to use navigator.onLine, as then it would leave ability to use this code in different environments, not just browser (next.js, react-native(web), node, etc.)

All that’s needed to retry is just retry failed parts for maxRetryAttempts with a delay of a second or so… Since it’s retrying individual parts to recover from an occasional error 500, or to survive a short drop of connectivity, the exponential backoff is really not required.

Read more comments on GitHub >

github_iconTop Results From Across the Web

lib-storage Upload working but result is missing #2694 - GitHub
Describe the bug. Hello there,. Since 3.26.0 the Upload command from lib-storage does not return the same data and therefore is not usable....
Read more >
Retry limit exceeded when uploading file - Stack Overflow
This Error: Uploading to GCS from a stream: Retry limit exceeded Is escalated by Client Library: for error 404 that occurs more than...
Read more >
Uploading and copying objects using multipart upload
If any part uploads were in-progress, they can still succeed or fail even after you stop the upload. To make sure you free...
Read more >
Automatic retries in client libraries | Data Integration
Most Cloud Storage client libraries perform retries automatically. ... Client libraries determine if an error is safe to retry for a given ...
Read more >
Implement a retry policy using the Azure Storage client library ...
If your app requires high availability and greater resiliency against failures, you can leverage Azure Storage geo-redundancy options as part of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found