question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add generic buffer (s3) support to read_hdf

See original GitHub issue

Hey folks, this is not a bug but a question/feature request.

Today read_hdf does not support reading hdf files directly from s3.

If you try to pass an s3 url directly, as you can do with read_csv you get a _file does not exist error message:

>>> import pandas as pd
>>> df = pd.read_hdf('s3://mybucket/myfile.h5', 'df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pandas/io/pytables.py", line 395, in read_hdf
    raise FileNotFoundError(f"File {path_or_buf} does not exist")
FileNotFoundError: File s3://mybucket/myfile.h5 does not exist

And if we try to pass a file-like as the path, we get a error saying that support for generic buffers is not implemented

>>> import pandas as pd
>>> from s3fs import S3FileSystem
>>> s3 = S3FileSystem(anon=False)
>>> df = pd.read_hdf(s3.open('mybucket/myfile.h5', mode='rb'), 'df')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pandas/io/pytables.py", line 385, in read_hdf
    "Support for generic buffers has not been implemented."
NotImplementedError: Support for generic buffers has not been implemented.

Are there any plans to implement generic buffers for read_hdf, so that we could read from s3 directly?

I took a quick look and this seems to be a restriction with tables not supporting s3 or file-like but I’m not sure.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:4
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jrebackcommented, Feb 12, 2020

this not likely as HDF5 doesn’t have much support for this

1reaction
Hagibaercommented, Apr 6, 2022

Whats the current status of this issue? I faced a similiar problem and was wondering if there is an update on this, or if the workaround still is to download the files to the local file system first. In the mentioned issue someone wanted to work on this, but apparently nothing happened. Any feedback is greatly appreciated 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

read hdf file from google cloud storage using pandas
h"). My problem is that i always get this error : NotImplementedError: Support for generic buffers has not been implemented.
Read more >
pandas.read_hdf — pandas 0.24.2 documentation
path_or_buf : string, buffer or path object. Path to the file to open, or an open pandas.HDFStore object. Supports any object implementing the...
Read more >
Access Hdf Files Stored On S3 In Pandas - ADocLib
Parameters: pathorbuf : string buffer or path object.Path to the file to open or an open pandas.HDFStore object.Supports any object implementing the. When...
Read more >
How can I read or write data to an AWS S3 Bucket? - Qvera
When you can't find a standard receiver or sender for connecting to a remote system, but that remote system has a Java library...
Read more >
NEWS - OSGeo
Add /vsiaz for Microsoft Azure Blobs and /vsioss for Alibaba Cloud Object ... New for GDAL/OGR 1.9.0 * Read/write support for Generic Tagged...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found