Disallowing ListObjectsV2 at the root of the bucket makes s3fs attempt to create a bucket
See original GitHub issueWhat happened:
It’s not uncommon to have a bucket that disallows listing the root, but allows listing a specific prefix. In this case s3fs
will fail any writes and will attempt to create the bucket, which often fails with a completely different error.
What you expected to happen:
Falling back to creating a bucket is very strange behaviour. I imagine it’s legacy and impossible to change, but I would expect that s3fs
does not require full list objects permissions over the bucket to perform any writes.
Minimal Complete Verifiable Example:
In [1]: import s3fs
In [2]: s3 = s3fs.S3FileSystem(anon=False)
In [5]: s3.mkdirs("s3://s3fs-test-bucket-123/foo/bar")
2021-09-17 17:34:50,222 - s3fs - DEBUG - _call_s3 -- CALL: list_objects_v2 - () - {'MaxKeys': 1, 'Bucket': 's3fs-test-bucket-123'}
2021-09-17 17:34:50,516 - s3fs - DEBUG - _call_s3 -- Nonretryable error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
2021-09-17 17:34:50,516 - s3fs - DEBUG - _call_s3 -- CALL: create_bucket - () - {'Bucket': 's3fs-test-bucket-123', 'ACL': ''}
2021-09-17 17:34:50,576 - s3fs - DEBUG - _call_s3 -- Nonretryable error: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
The full traceback is like so:
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/dask/dataframe/io/parquet/arrow.py", line 819, in initialize_write
fs.mkdirs(path, exist_ok=True)
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/fsspec/spec.py", line 1159, in mkdirs
return self.makedirs(path, exist_ok=exist_ok)
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/fsspec/asyn.py", line 88, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/fsspec/asyn.py", line 69, in sync
raise result[0]
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/s3fs/core.py", line 731, in _makedirs
await self._mkdir(path, create_parents=True)
File "/home/app/.cache/pypoetry/virtualenvs/x/lib/python3.9/site-packages/s3fs/core.py", line 716, in _mkdir
await self._call_s3("create_bucket", **params)
It seems like it’s failing to detect the bucket exists on this line. There are much better methods to detect if a bucket exists, like get-bucket-location
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
Disallowing ListObjectsV2 at the root of the bucket makes s3fs ...
In this case s3fs will fail any writes and will attempt to create the bucket, which often fails with a completely different error....
Read more >AWS S3: The bucket you are attempting to access must be ...
I'm using iOS SDK, and in the credential provider, there's a parameter where you can set the region. I've set that to the...
Read more >ListObjectsV2 - Amazon Simple Storage Service
Returns some or all (up to 1000) of the objects in a bucket with each request. You can use the request parameters as...
Read more >s3fs(1) — Arch manual pages
s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local ... s3fs is always using DNS...
Read more >Troubleshoot the 403 Forbidden error when uploading files ...
I'm trying to upload files to my Amazon Simple Storage Service (Amazon S3) bucket using the Amazon S3 console. However, I'm getting a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think this was designed for sub-directories, where we want to distinguish whether it’s the actual path which exists or a sub-path. That wouldn’t apply to buckets, though. So
MaxKeys
could depend on the context.Agree with @martindurant that it would reduce the burden, but still costful one some cases.
No, we don’t cache the results.