Improve handling of S3 offloading configuration in other regions
See original GitHub issueIs your enhancement request related to a problem? Please describe. Currently, when using S3 offloading in a region other than us-east-1, the underlying jclouds library does some non-standard AWS things which require both:
A) the policy Pulsar is running as to have GetBucketLocation
permissions and
B) to change the endpoint to use a region specific endpoint
This causes confusion which is not well documented and difficult to explain, and different from most AWS implementations. See https://github.com/apache/pulsar/issues/3833 for context
Describe the solution you’d like We should do 2 things
- See if we can eliminate the need for
GetBucketLocation
, looking at https://github.com/apache/jclouds/blob/31a3e5b5df1543d04098e3a694130b7ae8e6e079/apis/s3/src/main/java/org/jclouds/s3/config/S3HttpApiModule.java#L91 it appears to only be used when jclouds detects multiple regions. Where jcloud is getting more than one region from isn’t clear, but if the user sets a region, we should just use that single region and skip the getBucketLocationCheck - Ensure that setting just the region is sufficient to configure the correct endpoint. Getting rid of the
GetBucketLocation
check may be sufficient such that the default endpoint works, otherwise, we should build the correct endpoint name if the region is specified but no endpoint is manually provided
Describe alternatives you’ve considered Another consideration (and perhaps still a longer term goal) is to replace the use of jcloud for AWS (but still use it for other cloud providers) as jcloud does have some other behavior that differs from AWS.
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:10 (9 by maintainers)
Top GitHub Comments
If my understand is not wrong, current jcloud API we are using event don’t permit us to set region manually, it have to be generated by endpoint https://github.com/apache/pulsar/blob/98ad39ffa51239e389c73411dfb8df7f5592a5aa/tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/provider/JCloudBlobStoreProvider.java#L283
Maybe we should help Jcloud make right error info, or we can use the official SDK of AWS to generate Jcloud compatible endpoint in advance.
Here’s a possible work around for some people’s use cases: https://stackoverflow.com/questions/73169813/jclouds-getbucketlocation-timeout-on-getblob/73902608#73902608
Note that this ticket also suggests a possible general solution whereby jclouds provides a mechanism for users to pre-load the bucket-to-region LoadingCache with key/value pairs for their buckets that are in known regions. This would seem to be simpler than trying to rework the api from top to bottom to pass in the user-specified endpoint.