Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add generic buffer (s3) support to read_hdf

See original GitHub issue

Hey folks, this is not a bug but a question/feature request.

Today read_hdf does not support reading hdf files directly from s3.

If you try to pass an s3 url directly, as you can do with read_csv you get a _file does not exist error message:

>>> import pandas as pd
>>> df = pd.read_hdf('s3://mybucket/myfile.h5', 'df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pandas/io/pytables.py", line 395, in read_hdf
    raise FileNotFoundError(f"File {path_or_buf} does not exist")
FileNotFoundError: File s3://mybucket/myfile.h5 does not exist

And if we try to pass a file-like as the path, we get a error saying that support for generic buffers is not implemented

>>> import pandas as pd
>>> from s3fs import S3FileSystem
>>> s3 = S3FileSystem(anon=False)
>>> df = pd.read_hdf(s3.open('mybucket/myfile.h5', mode='rb'), 'df')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pandas/io/pytables.py", line 385, in read_hdf
    "Support for generic buffers has not been implemented."
NotImplementedError: Support for generic buffers has not been implemented.

Are there any plans to implement generic buffers for read_hdf, so that we could read from s3 directly?

I took a quick look and this seems to be a restriction with tables not supporting s3 or file-like but I’m not sure.

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

jrebackcommented, Feb 12, 2020

this not likely as HDF5 doesn’t have much support for this

1reaction

Hagibaercommented, Apr 6, 2022

Whats the current status of this issue? I faced a similiar problem and was wondering if there is an update on this, or if the workaround still is to download the files to the local file system first. In the mentioned issue someone wanted to work on this, but apparently nothing happened. Any feedback is greatly appreciated 😃

Top Results From Across the Web

read hdf file from google cloud storage using pandas

h"). My problem is that i always get this error : NotImplementedError: Support for generic buffers has not been implemented.

pandas.read_hdf — pandas 0.24.2 documentation

path_or_buf : string, buffer or path object. Path to the file to open, or an open pandas.HDFStore object. Supports any object implementing the...

Access Hdf Files Stored On S3 In Pandas - ADocLib

Parameters: pathorbuf : string buffer or path object.Path to the file to open or an open pandas.HDFStore object.Supports any object implementing the. When...

How can I read or write data to an AWS S3 Bucket? - Qvera

When you can't find a standard receiver or sender for connecting to a remote system, but that remote system has a Java library...

NEWS - OSGeo

Add /vsiaz for Microsoft Azure Blobs and /vsioss for Alibaba Cloud Object ... New for GDAL/OGR 1.9.0 * Read/write support for Generic Tagged...