question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv

See original GitHub issue

What happened:

s3fs creates a bucket even when one already exists, when using pandas.DataFrame.to_csv

What you expected to happen:

A bucket should not be created when one already exists so that:

  1. Users that have not been granted the s3:CreateBucket action can still write objects to the bucket
  2. The CreateBucket event is not triggered unnecessarily. This can be a problem for automatic processes that run on the CreateBucket event, like bucket tagging or compliance checks, that shouldn’t run on existing buckets.

Minimal Complete Verifiable Example:

$ pip install pandas==1.2.3 s3fs==0.5.2
...
$ python
Python 3.7.9 (default, Oct 14 2020, 16:19:52)
[Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd; df = pd.DataFrame(["foo","bar"])
>>> df.to_csv("s3://oliver-delete-me-test/foowrite.csv")
Traceback (most recent call last):
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 599, in _mkdir
    await self.s3.create_bucket(**params)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/aiobotocore/client.py", line 154, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/core/generic.py", line 3403, in to_csv
    storage_options=storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1083, in to_csv
    csv_formatter.save()
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 234, in save
    storage_options=self.storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 563, in get_handle
    storage_options=storage_options,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/pandas/io/common.py", line 345, in _get_filepath_or_buffer
    filepath_or_buffer, mode=fsspec_mode, **(storage_options or {})
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 438, in open
    **kwargs,
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in open_files
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/core.py", line 291, in <listcomp>
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 614, in makedirs
    self.mkdir(path, create_parents=True)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/Users/tekumara/.virtualenvs/tmp-36835a7413f0541/lib/python3.7/site-packages/s3fs/core.py", line 603, in _mkdir
    raise translate_boto_error(e) from e
PermissionError: Access Denied

Anything else we need to know?:

This doesn’t happen when using s3fs directly, eg:

>>> import s3fs
>>> s3 = s3fs.S3FileSystem()
>>> with s3.open('oliver-delete-me-test/foowrite.csv', 'wb') as f:
...     f.write(2*2**20 * b'a')

This doesn’t happen in s3f3 0.4.2 because it doesn’t implement make_dirs.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:5
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Apr 18, 2022

Let’s close this one, since it’s old, and @Mahdi-Hosseinali please open a new one showing what you tried to do and the problem you found.

0reactions
Mahdi-Hosseinalicommented, Apr 18, 2022

The PR is merged last July, I tried 2021.11.1 and 2022.3.0 and still have this issue when writing cross-environment files.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Save Dataframe to csv directly to s3 Python - Stack Overflow
You can use: from io import StringIO # python3; python2: BytesIO import boto3 bucket = 'my_bucket_name' # already created on S3 csv_buffer ......
Read more >
Reading and writing files from/to Amazon S3 with Pandas
Reading and writing files from/to Amazon S3 with Pandas using the boto3 library and s3fs-supported pandas APIs.
Read more >
Faster Data Loading for Pandas on S3 | by Joshua Robinson
First, the Pandas load times from data already in memory and from local files are the same, indicating the bottleneck is entirely CSV...
Read more >
Keeping your datasets in the cloud. Pythonic guide on AWS ...
In the Review tab click Create bucket. That's it. Important thing is to remember the name that you gave to your bucket. Taking...
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Element order is ignored, so usecols=[0, 1] is the same as [1, 0] . To instantiate a DataFrame from data with element order...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found