Unable to specify AWS profile when saving parquet file with s3 url
See original GitHub issueI would like to be able to specify the named profile to use when uploading a dataframe as parquet file to S3. This feature seems to be missing. The work-around is to specify the default account as the the account I would like to use, but this does not allow me to specify programmatically the account name to use.
Code Sample
$ python
Python 3.6.2 (default, Jul 17 2017, 16:44:45)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>>
>>> s3_url = 's3://my_bucket/foo/bar/example.parquet'
>>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
>>>
>>> df.to_parquet(s3_url, engine='fastparquet', profile='production')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/core/frame.py", line 1691, in to_parquet
compression=compression, **kwargs)
File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 248, in to_parquet
return impl.write(df, path, compression=compression, **kwargs)
File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 210, in write
compression=compression, **kwargs)
TypeError: write() got an unexpected keyword argument 'profile'
>>>
I also tried the profile_name
argument. This feature seems to be missing.
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+273.gcd484cc52 pytest: 3.4.0 pip: 9.0.1 setuptools: 28.8.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: 0.8.0 xarray: None IPython: None sphinx: None patsy: 0.5.0 dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.2.2 pymysql: None psycopg2: None jinja2: None s3fs: 0.1.3 fastparquet: 0.1.4 pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
in case someone shows up here, the kwarg
fspec
is looking for is “profile”, so this should bedf.to_parquet("s3://...", storage_options={"profile": "production"})
.https://github.com/dask/s3fs/issues/324
I believe that this was properly fixed by https://github.com/pandas-dev/pandas/pull/35381. The syntax would be
df.to_parquet("s3://...", storage_options={"profile_name": "production"})
once that’s fixed.