question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: s3 reads from public buckets not working

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

# Your code here
import pandas as pd
df = pd.read_csv("s3://nyc-tlc/trip data/yellow_tripdata_2019-01.csv")
Error stack trace
Traceback (most recent call last):
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py", line 33, in get_file_and_filesystem
    file = fs.open(_strip_schema(filepath_or_buffer), mode)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
    **kwargs
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 378, in _open
    autocommit=autocommit, requester_pays=requester_pays)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 1097, in __init__
    cache_type=cache_type)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py", line 1065, in __init__
    self.details = fs.info(path)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 530, in info
    Key=key, **version_id_kw(version_id), **self.req_kw)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 200, in _call_s3
    return method(**additional_kwargs)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/parsers.py”, line 676, in parser_f return _read(filepath_or_buffer, kwds) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/parsers.py”, line 431, in _read filepath_or_buffer, encoding, compression File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/common.py”, line 212, in get_filepath_or_buffer filepath_or_buffer, encoding=encoding, compression=compression, mode=mode File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py”, line 52, in get_filepath_or_buffer file, _fs = get_file_and_filesystem(filepath_or_buffer, mode=mode) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py”, line 42, in get_file_and_filesystem file = fs.open(_strip_schema(filepath_or_buffer), mode) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py”, line 775, in open **kwargs File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 378, in _open autocommit=autocommit, requester_pays=requester_pays) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 1097, in init cache_type=cache_type) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py”, line 1065, in init self.details = fs.info(path) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 530, in info Key=key, **version_id_kw(version_id), **self.req_kw) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 200, in _call_s3 return method(**additional_kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 316, in _api_call return self._make_api_call(operation_name, kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 622, in _make_api_call operation_model, request_dict, request_context) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 641, in _make_request return self._endpoint.make_request(operation_model, request_dict) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 102, in make_request return self._send_request(request_dict, operation_model) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 132, in _send_request request = self.create_request(request_dict, operation_model) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 116, in create_request operation_name=operation_model.name) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 228, in emit return self._emit(event_name, kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 211, in _emit response = handler(**kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py”, line 90, in handler return self.sign(operation_name, request) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py”, line 160, in sign auth.add_auth(request) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/auth.py”, line 357, in add_auth raise NoCredentialsError

Problem description

Reading directly from s3 public buckets (without manually configuring the anon parameter via s3fs) is broken with pandas 1.0.4 (worked with 1.0.3).

Looks like reading from public buckets requires anon=True while creating the filesystem. This 22cf0f5dfcfbddd5506fdaf260e485bff1b88ef1 seems to have introduced the issue, where anon=False is passed when the noCredentialsError is encountered.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-55-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.4 numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 47.1.1.post20200604 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : None pyxlsb : None s3fs : 0.4.2 scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
alimcmaster1commented, Jun 9, 2020

The fix for this to target 1.1 is to set ‘anon=True’ in S3FileSystem https://github.com/pandas-dev/pandas/pull/33632/files#diff-a37b395bed03f0404dec864a4529c97dR41

I’ll wait as we are moving to fsspec which gets rid of this logic https://github.com/pandas-dev/pandas/pull/34266 - but we should definitely trying using moto to test this.

0reactions
jorisvandenbosschecommented, Jul 8, 2020

Long-term we might want to get away from this logic

On the other hand, it seems nice that reading from a public bucket just works out of the box without needing the pass any option?

Read more comments on GitHub >

github_iconTop Results From Across the Web

7 Ways AWS Can Fix Its Public S3 Bucket Problem - Matt Fuller
1. Decouple public access from buckets entirely. · 2. Merge ACLs and bucket policies · 3. Make public bucket access a CLI-only setting...
Read more >
Troubleshoot Amazon S3 content loading issue
Objects uploaded to a bucket by another account won't be readable by the bucket's account by default. The account that uploaded the object...
Read more >
Troubleshoot Amazon S3 content loading issue - AWS re:Post
I'm using an Amazon Simple Storage Service (Amazon S3) bucket to store content ... can grant the object public read access by running...
Read more >
This is the reason your S3 bucket is denying you access
1. Unchecking "block all public access." ... This check-box appears when you first create your bucket, so if you missed it, head to...
Read more >
Why my s3 bucket has list/read/write open to public and how ...
What's inside the files? It looks like you have setup something to log data to that bucket. The files are not public (as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found