question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading HDF-5 file with s3fs from a private object store

See original GitHub issue

To assist reproducing bugs, please include the following:

  • Operating System: Linux (CentOS 7.x)
  • Python version: 3.7.5
  • Where Python was acquired: Anaconda (h0371630_0)
  • h5py version 2.10.0 from conda-forge (nompi_py37h513d04c_100) (observed same with 2.9.0)
  • HDF5 version: 1.10.5
  • s3fs version: 0.3.5
  • h5py detailed version:
Summary of the h5py configuration
---------------------------------

h5py    2.10.0
HDF5    1.10.5
Python  3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.17.3
  • The full traceback/stack trace shown
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
  File "h5py/defs.pyx", line 1234, in h5py.defs.H5Idec_ref
  File "h5py/h5fd.pyx", line 169, in h5py.h5fd.H5FD_fileobj_write
AttributeError: 'S3File' object has no attribute 'seek'
Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
  File "h5py/defs.pyx", line 1234, in h5py.defs.H5Idec_ref
  File "h5py/h5fd.pyx", line 169, in h5py.h5fd.H5FD_fileobj_write
AttributeError: 'S3File' object has no attribute 'seek'

The code to reproduce:

import s3fs
import h5py

s3_fs = s3fs.S3FileSystem(
               anon=False, 
               client_kwargs={'endpoint_url':'https://internal-obj-url:9021'}
)
f = h5py.File(s3_fs.open("my-bucket/my-file.h5"))

(I have also tried f = h5py.File(s3_fs.open("s3://my-bucket/my-file.h5")) with the same results)

I have observed this with both 2.9.0 and 2.10.0 version of h5py. I have tested s3fs separately on the same object running open() and seek() and both work fine. Additionally, the same mechanism works with the same object store and s3fs to pull CSV data into Pandas.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
xavfernandezcommented, Dec 3, 2019

@votavap Did you try with context managers ? I had a similar issues a while back but using nested context managers solved it (and still works)

    s3 = s3fs.S3FileSystem()
    with s3.open("my-bucket/my-file.h5") as s3f:
        with h5py.File(s3f, 'r') as f:
            pass
0reactions
votavapcommented, Dec 3, 2019

Thanks, I will give it a try, my guess would be that it may work better for some access patterns, but not in general. I am closing this one as well as it seems resolved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

h5py slow when reading through an s3fs file object
I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon...
Read more >
HDF in the Cloud challenges and solutions for scientific data
Multi-dimensional data, such as is commonly stored in HDF and NetCDF formats, is difficult to access on traditional cloud storage platforms.
Read more >
Appendix B HDF5 API Reference Manual - Earthdata
The entire attribute is read into buf from the file. Datatype conversion takes place at the time of a read or write and...
Read more >
Cloud Storage Options for HDF5 - The HDF Group
S3FS works with h5py by passing a “file like object” to the h5py.File class and (like the ros3 VFD) each read operation will...
Read more >
Version 0.20.1 (May 5, 2017) — pandas 1.5.2 documentation
Bug in converting object elements of array-like objects to unsigned 64-bit ... You can use a recent prior version of pandas to read...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found