Error reading HDF-5 file with s3fs from a private object store
See original GitHub issueTo assist reproducing bugs, please include the following:
- Operating System: Linux (CentOS 7.x)
- Python version: 3.7.5
- Where Python was acquired: Anaconda (h0371630_0)
- h5py version 2.10.0 from conda-forge (nompi_py37h513d04c_100) (observed same with 2.9.0)
- HDF5 version: 1.10.5
- s3fs version: 0.3.5
- h5py detailed version:
Summary of the h5py configuration
---------------------------------
h5py 2.10.0
HDF5 1.10.5
Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.17.3
- The full traceback/stack trace shown
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
File "h5py/defs.pyx", line 1234, in h5py.defs.H5Idec_ref
File "h5py/h5fd.pyx", line 169, in h5py.h5fd.H5FD_fileobj_write
AttributeError: 'S3File' object has no attribute 'seek'
Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
File "h5py/defs.pyx", line 1234, in h5py.defs.H5Idec_ref
File "h5py/h5fd.pyx", line 169, in h5py.h5fd.H5FD_fileobj_write
AttributeError: 'S3File' object has no attribute 'seek'
The code to reproduce:
import s3fs
import h5py
s3_fs = s3fs.S3FileSystem(
anon=False,
client_kwargs={'endpoint_url':'https://internal-obj-url:9021'}
)
f = h5py.File(s3_fs.open("my-bucket/my-file.h5"))
(I have also tried f = h5py.File(s3_fs.open("s3://my-bucket/my-file.h5"))
with the same results)
I have observed this with both 2.9.0 and 2.10.0 version of h5py. I have tested s3fs separately on the same object running open()
and seek()
and both work fine. Additionally, the same mechanism works with the same object store and s3fs to pull CSV data into Pandas.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
h5py slow when reading through an s3fs file object
I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon...
Read more >HDF in the Cloud challenges and solutions for scientific data
Multi-dimensional data, such as is commonly stored in HDF and NetCDF formats, is difficult to access on traditional cloud storage platforms.
Read more >Appendix B HDF5 API Reference Manual - Earthdata
The entire attribute is read into buf from the file. Datatype conversion takes place at the time of a read or write and...
Read more >Cloud Storage Options for HDF5 - The HDF Group
S3FS works with h5py by passing a “file like object” to the h5py.File class and (like the ros3 VFD) each read operation will...
Read more >Version 0.20.1 (May 5, 2017) — pandas 1.5.2 documentation
Bug in converting object elements of array-like objects to unsigned 64-bit ... You can use a recent prior version of pandas to read...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@votavap Did you try with context managers ? I had a similar issues a while back but using nested context managers solved it (and still works)
Thanks, I will give it a try, my guess would be that it may work better for some access patterns, but not in general. I am closing this one as well as it seems resolved.