BUG: s3 reads from public buckets not working
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
# Your code here
import pandas as pd
df = pd.read_csv("s3://nyc-tlc/trip data/yellow_tripdata_2019-01.csv")
Error stack trace
Traceback (most recent call last): File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py", line 33, in get_file_and_filesystem file = fs.open(_strip_schema(filepath_or_buffer), mode) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open **kwargs File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 378, in _open autocommit=autocommit, requester_pays=requester_pays) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 1097, in __init__ cache_type=cache_type) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py", line 1065, in __init__ self.details = fs.info(path) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 530, in info Key=key, **version_id_kw(version_id), **self.req_kw) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py", line 200, in _call_s3 return method(**additional_kwargs) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 622, in _make_api_call operation_model, request_dict, request_context) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py", line 641, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request request = self.create_request(request_dict, operation_model) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request operation_name=operation_model.name) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign auth.add_auth(request) File "/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentialsDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/parsers.py”, line 676, in parser_f return _read(filepath_or_buffer, kwds) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/parsers.py”, line 431, in _read filepath_or_buffer, encoding, compression File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/common.py”, line 212, in get_filepath_or_buffer filepath_or_buffer, encoding=encoding, compression=compression, mode=mode File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py”, line 52, in get_filepath_or_buffer file, _fs = get_file_and_filesystem(filepath_or_buffer, mode=mode) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/pandas/io/s3.py”, line 42, in get_file_and_filesystem file = fs.open(_strip_schema(filepath_or_buffer), mode) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py”, line 775, in open **kwargs File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 378, in _open autocommit=autocommit, requester_pays=requester_pays) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 1097, in init cache_type=cache_type) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/fsspec/spec.py”, line 1065, in init self.details = fs.info(path) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 530, in info Key=key, **version_id_kw(version_id), **self.req_kw) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/s3fs/core.py”, line 200, in _call_s3 return method(**additional_kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 316, in _api_call return self._make_api_call(operation_name, kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 622, in _make_api_call operation_model, request_dict, request_context) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/client.py”, line 641, in _make_request return self._endpoint.make_request(operation_model, request_dict) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 102, in make_request return self._send_request(request_dict, operation_model) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 132, in _send_request request = self.create_request(request_dict, operation_model) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/endpoint.py”, line 116, in create_request operation_name=operation_model.name) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 228, in emit return self._emit(event_name, kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/hooks.py”, line 211, in _emit response = handler(**kwargs) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py”, line 90, in handler return self.sign(operation_name, request) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/signers.py”, line 160, in sign auth.add_auth(request) File “/home/conda/envs/pandas-test/lib/python3.7/site-packages/botocore/auth.py”, line 357, in add_auth raise NoCredentialsError
Problem description
Reading directly from s3 public buckets (without manually configuring the anon
parameter via s3fs) is broken with pandas 1.0.4 (worked with 1.0.3).
Looks like reading from public buckets requires anon=True
while creating the filesystem. This 22cf0f5dfcfbddd5506fdaf260e485bff1b88ef1 seems to have introduced the issue, where anon=False
is passed when the noCredentialsError
is encountered.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-55-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.4 numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 47.1.1.post20200604 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : None pyxlsb : None s3fs : 0.4.2 scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
The fix for this to target 1.1 is to set ‘anon=True’ in S3FileSystem https://github.com/pandas-dev/pandas/pull/33632/files#diff-a37b395bed03f0404dec864a4529c97dR41
I’ll wait as we are moving to fsspec which gets rid of this logic https://github.com/pandas-dev/pandas/pull/34266 - but we should definitely trying using moto to test this.
On the other hand, it seems nice that reading from a public bucket just works out of the box without needing the pass any option?