Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv

See original GitHub issue

What happened:

s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv

What you expected to happen:

A bucket should not be created when one already exists so that:

Users that have not been granted the s3:CreateBucket action can still write objects to the bucket
The CreateBucket event is not triggered unnecessarily. This can be a problem for automatic processes that run on the CreateBucket event, like bucket tagging or compliance checks, that shouldn’t run on existing buckets.

Minimal Complete Verifiable Example:

$ pip install pandas==1.2.3 s3fs==0.5.2
...
$ python
Python 3.7.9 (default, Oct 14 2020, 16:19:52)
[Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd; df = pd.DataFrame(["foo","bar"])
>>> df.to_csv("s3://oliver-delete-me-test/foowrite.csv")
Traceback (most recent call last):
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 599, in _mkdir
    await self.s3.create_bucket(**params)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/aiobotocore/client.py", line 154, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/core/generic.py", line 3403, in to_csv
    storage_options=storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1083, in to_csv
    csv_formatter.save()
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 234, in save
    storage_options=self.storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 563, in get_handle
    storage_options=storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 345, in _get_filepath_or_buffer
    filepath_or_buffer, mode=fsspec_mode, **(storage_options or {})
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 438, in open
    **kwargs,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in open_files
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in <listcomp>
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 614, in makedirs
    self.mkdir(path, create_parents=True)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 603, in _mkdir
    raise translate_boto_error(e) from e
PermissionError: Access Denied

Anything else we need to know?:

This doesn’t happen when using s3fs directly, eg:

>>> import s3fs
>>> s3 = s3fs.S3FileSystem()
>>> with s3.open('oliver-delete-me-test/foowrite.csv', 'wb') as f:
...     f.write(2*2**20 * b'a')

This doesn’t happen in s3f3 0.4.2 because it doesn’t implement make_dirs.

Issue Analytics

State:
Created 2 years ago
Reactions:5
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

martindurantcommented, Apr 18, 2022

Let’s close this one, since it’s old, and @Mahdi-Hosseinali please open a new one showing what you tried to do and the problem you found.

0reactions

Mahdi-Hosseinalicommented, Apr 18, 2022

The PR is merged last July, I tried 2021.11.1 and 2022.3.0 and still have this issue when writing cross-environment files.

Top Results From Across the Web

Save Dataframe to csv directly to s3 Python - Stack Overflow

You can use: from io import StringIO # python3; python2: BytesIO import boto3 bucket = 'my_bucket_name' # already created on S3 csv_buffer ......

Reading and writing files from/to Amazon S3 with Pandas

Reading and writing files from/to Amazon S3 with Pandas using the boto3 library and s3fs-supported pandas APIs.

Faster Data Loading for Pandas on S3 | by Joshua Robinson

First, the Pandas load times from data already in memory and from local files are the same, indicating the bottleneck is entirely CSV...

Keeping your datasets in the cloud. Pythonic guide on AWS ...

In the Review tab click Create bucket. That's it. Important thing is to remember the name that you gave to your bucket. Taking...

IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation

Element order is ignored, so usecols=[0, 1] is the same as [1, 0] . To instantiate a DataFrame from data with element order...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

leaking AWS credentials in pytest suite

Add ability to check integrity of uploaded object