s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv
See original GitHub issueWhat happened:
s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv
What you expected to happen:
A bucket should not be created when one already exists so that:
- Users that have not been granted the s3:CreateBucket action can still write objects to the bucket
- The CreateBucket event is not triggered unnecessarily. This can be a problem for automatic processes that run on the CreateBucket event, like bucket tagging or compliance checks, that shouldn’t run on existing buckets.
Minimal Complete Verifiable Example:
$ pip install pandas==1.2.3 s3fs==0.5.2
...
$ python
Python 3.7.9 (default, Oct 14 2020, 16:19:52)
[Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd; df = pd.DataFrame(["foo","bar"])
>>> df.to_csv("s3://oliver-delete-me-test/foowrite.csv")
Traceback (most recent call last):
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 599, in _mkdir
await self.s3.create_bucket(**params)
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/aiobotocore/client.py", line 154, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/core/generic.py", line 3403, in to_csv
storage_options=storage_options,
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1083, in to_csv
csv_formatter.save()
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 234, in save
storage_options=self.storage_options,
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 563, in get_handle
storage_options=storage_options,
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 345, in _get_filepath_or_buffer
filepath_or_buffer, mode=fsspec_mode, **(storage_options or {})
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 438, in open
**kwargs,
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in open_files
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in <listcomp>
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 614, in makedirs
self.mkdir(path, create_parents=True)
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 121, in wrapper
return maybe_sync(func, self, *args, **kwargs)
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 100, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
raise exc.with_traceback(tb)
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 55, in f
result[0] = await future
File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 603, in _mkdir
raise translate_boto_error(e) from e
PermissionError: Access Denied
Anything else we need to know?:
This doesn’t happen when using s3fs directly, eg:
>>> import s3fs
>>> s3 = s3fs.S3FileSystem()
>>> with s3.open('oliver-delete-me-test/foowrite.csv', 'wb') as f:
... f.write(2*2**20 * b'a')
This doesn’t happen in s3f3 0.4.2 because it doesn’t implement make_dirs
.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Save Dataframe to csv directly to s3 Python - Stack Overflow
You can use: from io import StringIO # python3; python2: BytesIO import boto3 bucket = 'my_bucket_name' # already created on S3 csv_buffer ......
Read more >Reading and writing files from/to Amazon S3 with Pandas
Reading and writing files from/to Amazon S3 with Pandas using the boto3 library and s3fs-supported pandas APIs.
Read more >Faster Data Loading for Pandas on S3 | by Joshua Robinson
First, the Pandas load times from data already in memory and from local files are the same, indicating the bottleneck is entirely CSV...
Read more >Keeping your datasets in the cloud. Pythonic guide on AWS ...
In the Review tab click Create bucket. That's it. Important thing is to remember the name that you gave to your bucket. Taking...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Element order is ignored, so usecols=[0, 1] is the same as [1, 0] . To instantiate a DataFrame from data with element order...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Let’s close this one, since it’s old, and @Mahdi-Hosseinali please open a new one showing what you tried to do and the problem you found.
The PR is merged last July, I tried
2021.11.1
and2022.3.0
and still have this issue when writing cross-environment files.