question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Backoff and retry operations when S3 request limit is hit

See original GitHub issue

Alluxio Version: 2.7

Describe the bug When creating a lot of small files concurrently on S3 UFS, the rate limit of S3 is triggered and causes operations to fail. According to AWS docs, the rate limit on a prefix is 3500 op/s.

To Reproduce Mount S3 as ufs. Use a write type other than MUST_CACHE. Use the compaction benchmark tool: https://github.com/Alluxio/alluxio/pull/14600

Run something like

bin/alluxio runClass alluxio.stress.cli.client.CompactionBench --cluster \
--threads 32 --source-files 800 --source-dirs 100 --compact-ratio 100 --source-file-size 10k

on a 5 job worker cluster will trigger the rate limit at the preparation stage.

This command uses 32 threads on each job worker to create 5x100 directories in a S3 bucket and inside each dir create 800 10KB small files.

Expected behavior The rate limit is not a hard error, so Alluxio should handle that gracefully, e.g. use an exponential backoff retry policy to retry the operation after a short period of time.

Urgency Medium.

Additional context

Error stack traces

2021-12-02 07:08:24,714 ERROR S3AOutputStream - Failed to upload compaction-base/source/CompactionBench-worker-0-1638428600140/1/225
com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0CG6ZMATS2XCH; S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=; Proxy: null), S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
        at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394)
        at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950)
        at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812)
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772)
        at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:168)
        at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:148)
        at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:115)
        at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:45)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2021-12-02 07:08:24,714 WARN  AbstractWriteHandler - Exception occurred while completing write request UfsFileWriteRequest{id=-1, sessionId=8802067248833981977, ufsPath=s3://compactionbench-default-9c8k/compaction-base/source/CompactionBench-worker-0-1638428600140/1/225, createUfsFileOptions=ufs_path: "s3://compactionbench-default-9c8k/compaction-base/source/CompactionBench-worker-0-1638428600140/1/225"
owner: "alluxio"
group: "alluxio"
mode: 438
mount_id: 1
acl {
  owningUser: "alluxio"
  owningGroup: "alluxio"
  userActions {
    name: ""
    actions {
      actions: READ
      actions: WRITE
    }
  }
  groupActions {
    name: ""
    actions {
      actions: READ
      actions: WRITE
    }
  }
  otherActions {
    actions: READ
    actions: WRITE
  }
  }
  isDefault: false
  isEmpty: false
}
}.: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0CG6ZMATS2XCH; S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=; Proxy: null), S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=
2021-12-02 07:08:24,822 WARN  AbstractWriteHandler - Failed to cleanup states with error java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0T8R82T27SF79; S3 Extended Request ID: 5s4q8kCt6UG0rxwpplabubaVf90PRaASx87ztPOMH76bfNgav2ju9Yup0bTmGW5mA5Mwo4S5a4M=; Proxy: null), S3 Extended Request ID: 5s4q8kCt6UG0rxwpplabubaVf90PRaASx87ztPOMH76bfNgav2ju9Yup0bTmGW5mA5Mwo4S5a4M=.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ZhuTophercommented, Dec 6, 2021

Sure, I’ll take a look at the implementation you added on that branch to start with.

0reactions
dbw9580commented, Jan 25, 2022

No, the results are from the compaction benchmark, which just show a lot of job failures. Nothing helpful on this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error retries and exponential backoff in AWS
Configure retry settings in the client application when errors occur and use an exponential backoff algorithm for better flow control.
Read more >
Retries — Boto3 Docs 1.26.26 documentation - Amazon AWS
Any retry attempt will include an exponential backoff by a base factor of 2 for a maximum backoff time of 20 seconds.
Read more >
High S3 API request rate during delete operations ...
High S3 API request rate during delete operations (exponential backoff algorithm) ... Hi, while performing delete operations on some old Disk ( ...
Read more >
Let's Try Again: Making Retries Work With Cloud Services
Their client libraries also incorporate automatic retries. They may allow an application to gracefully recover after exceeding those rate limits ...
Read more >
S3 PutObject rate limit reached - AWS re:Post
To avoid these errors, you can configure your application to gradually increase the request rate and retry failed requests using an exponential backoff ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found