question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to specify AWS profile when saving parquet file with s3 url

See original GitHub issue

I would like to be able to specify the named profile to use when uploading a dataframe as parquet file to S3. This feature seems to be missing. The work-around is to specify the default account as the the account I would like to use, but this does not allow me to specify programmatically the account name to use.

Code Sample

$ python
Python 3.6.2 (default, Jul 17 2017, 16:44:45) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> 
>>> s3_url = 's3://my_bucket/foo/bar/example.parquet'
>>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
>>> 
>>> df.to_parquet(s3_url, engine='fastparquet', profile='production')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/core/frame.py", line 1691, in to_parquet
    compression=compression, **kwargs)
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 248, in to_parquet
    return impl.write(df, path, compression=compression, **kwargs)
  File "/Users/user/mytool/env/lib/python3.6/site-packages/pandas/io/parquet.py", line 210, in write
    compression=compression, **kwargs)
TypeError: write() got an unexpected keyword argument 'profile'
>>> 

I also tried the profile_name argument. This feature seems to be missing.

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+273.gcd484cc52 pytest: 3.4.0 pip: 9.0.1 setuptools: 28.8.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: 0.8.0 xarray: None IPython: None sphinx: None patsy: 0.5.0 dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.2.2 pymysql: None psycopg2: None jinja2: None s3fs: 0.1.3 fastparquet: 0.1.4 pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
drewfustincommented, Mar 9, 2022

in case someone shows up here, the kwarg fspec is looking for is “profile”, so this should be df.to_parquet("s3://...", storage_options={"profile": "production"}).

0reactions
TomAugspurgercommented, Aug 28, 2020

https://github.com/dask/s3fs/issues/324

I believe that this was properly fixed by https://github.com/pandas-dev/pandas/pull/35381. The syntax would be df.to_parquet("s3://...", storage_options={"profile_name": "production"}) once that’s fixed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Running Pyspark locally to access parquet file in S3 Error
python - Running Pyspark locally to access parquet file in S3 Error: "Unable to load AWS credentials from any provider in the chain"...
Read more >
Not able to read S3 Parquet file - AWS re:Post
Hi Team, I'm trying to read Parquet files in S3, but I get the following error. Please help. I'm not sure if the...
Read more >
Using the Parquet format in AWS Glue
You can use AWS Glue to read Parquet files from Amazon S3 and from streaming sources ... Configuration: In your function options, specify...
Read more >
Resolve errors uploading data to or downloading data from ...
I want to upload data to Amazon Aurora from Amazon Simple Storage Service (Amazon S3). -or-. I want to download data from Amazon...
Read more >
Resolve "Access Denied" errors when running Athena queries
The Amazon Simple Storage Service (Amazon S3) bucket policies don't allow the required permissions to the IAM user.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found