ENH: support `storage_options` argument in `read_parquet`
See original GitHub issueIs your feature request related to a problem?
I store lots of data in a quilt bucket (i.e. S3 storage) and use s3fs with geopandas to read data directly from the wire, like
gpd.read_parquet("s3://spatial-ucr/census/administrative/counties.parquet")
often, that works perfectly. But depending on the botocore/sf3fs/aiobotocore/fsspec version collection, it can throw botocore.exceptions.NoCredentialsError: Unable to locate credentials
.
Describe the solution you’d like
the pandas version of read_parquet
supports passing storage_options={"anon": True}
which I believe will get around that particular error, but in geopandas that argument fails with TypeError: read_table() got an unexpected keyword argument 'storage_options'
. It would be great if gpd.read_parquet
would allow me to pass that arg as well.
API breaking implications
None
Describe alternatives you’ve considered
I could probably read the file directly with pandas, then convert the serialized geometry column myself, but that would skirt the nice efficient implementation already in the geopandas version of read_parquet 😃
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
And on the original topic: I think it’s a good idea to add support for the
storage_options
keyword (there are other aspects you might want to tweak, like the region, or endpoint, etc).Athough it’s in theory superfluous with passing an actual filesystem object (and you can create an
s3fs
filesystem with those same storage_options, pyarrow will accept a s3fs filesystem as well), it gives consistency with pandas and dask (and dask-geopandas).Implementation wise, I think we can do something like:
One guess: it might be that if you pass an explicit filesystem object, you need to leave out the
s3://
from the file path