Backoff and retry operations when S3 request limit is hit
See original GitHub issueAlluxio Version: 2.7
Describe the bug When creating a lot of small files concurrently on S3 UFS, the rate limit of S3 is triggered and causes operations to fail. According to AWS docs, the rate limit on a prefix is 3500 op/s.
To Reproduce
Mount S3 as ufs.
Use a write type other than MUST_CACHE
.
Use the compaction benchmark tool: https://github.com/Alluxio/alluxio/pull/14600
Run something like
bin/alluxio runClass alluxio.stress.cli.client.CompactionBench --cluster \
--threads 32 --source-files 800 --source-dirs 100 --compact-ratio 100 --source-file-size 10k
on a 5 job worker cluster will trigger the rate limit at the preparation stage.
This command uses 32 threads on each job worker to create 5x100 directories in a S3 bucket and inside each dir create 800 10KB small files.
Expected behavior The rate limit is not a hard error, so Alluxio should handle that gracefully, e.g. use an exponential backoff retry policy to retry the operation after a short period of time.
Urgency Medium.
Additional context
Error stack traces
2021-12-02 07:08:24,714 ERROR S3AOutputStream - Failed to upload compaction-base/source/CompactionBench-worker-0-1638428600140/1/225
com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0CG6ZMATS2XCH; S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=; Proxy: null), S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394)
at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950)
at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:168)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:148)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:115)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:45)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-12-02 07:08:24,714 WARN AbstractWriteHandler - Exception occurred while completing write request UfsFileWriteRequest{id=-1, sessionId=8802067248833981977, ufsPath=s3://compactionbench-default-9c8k/compaction-base/source/CompactionBench-worker-0-1638428600140/1/225, createUfsFileOptions=ufs_path: "s3://compactionbench-default-9c8k/compaction-base/source/CompactionBench-worker-0-1638428600140/1/225"
owner: "alluxio"
group: "alluxio"
mode: 438
mount_id: 1
acl {
owningUser: "alluxio"
owningGroup: "alluxio"
userActions {
name: ""
actions {
actions: READ
actions: WRITE
}
}
groupActions {
name: ""
actions {
actions: READ
actions: WRITE
}
}
otherActions {
actions: READ
actions: WRITE
}
}
isDefault: false
isEmpty: false
}
}.: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0CG6ZMATS2XCH; S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=; Proxy: null), S3 Extended Request ID: txaiOiA7+btb44IeotnNajogh0XKv63pwSnOPNNMMt3OmnESeOLpv7vFkw8t7dAy7cBSeDz6HSg=
2021-12-02 07:08:24,822 WARN AbstractWriteHandler - Failed to cleanup states with error java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AST0T8R82T27SF79; S3 Extended Request ID: 5s4q8kCt6UG0rxwpplabubaVf90PRaASx87ztPOMH76bfNgav2ju9Yup0bTmGW5mA5Mwo4S5a4M=; Proxy: null), S3 Extended Request ID: 5s4q8kCt6UG0rxwpplabubaVf90PRaASx87ztPOMH76bfNgav2ju9Yup0bTmGW5mA5Mwo4S5a4M=.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Sure, I’ll take a look at the implementation you added on that branch to start with.
No, the results are from the compaction benchmark, which just show a lot of job failures. Nothing helpful on this issue.