Unusual vlen type Attribute can be created but not read.
See original GitHub issueFirst, I’ve done my testing in the following setup
h5py 3.1.0
HDF5 1.12.0
Python 3.9.1 (default, Jan 20 2021, 00:00:00)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.20.0
cython (built with) 0.29.21
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0
However, other people using the package I wrote that has the unusual Attribute have been reporting the problem in h5py 3.0.0. I’ve tested the same code with 2.10.0 just earlier today with no problem and have been using the Attribute successfully in every h5py 2.x.y version since 2.3 (first one where this Attribute was even possible to create, since 2.2 did not support writing it).
The Attribute is an array of vlens of type numpy.dtype(‘S1’).
The following code makes and writes the Attribute, but then when it reads it afterwards gets an error
>>> import numpy, h5py
>>> dt = h5py.vlen_dtype(numpy.dtype('S1'))
>>> a = numpy.empty((1, ), dtype=dt)
>>> a[0] = numpy.array([b'a', b'b'], dtype='S1')
>>> f = h5py.File('data.h5', mode='a')
>>> f.attrs.create('test', a)
>>> f.attrs['test']
with the following output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-b22c92ac9bc4> in <module>
5 f = h5py.File('data.h5', mode='a')
6 f.attrs.create('test', a)
----> 7 f.attrs['test']
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
~/.local/lib/python3.9/site-packages/h5py/_hl/attrs.py in __getitem__(self, name)
75
76 arr = numpy.ndarray(shape, dtype=dtype, order='C')
---> 77 attr.read(arr, mtype=htype)
78
79 string_info = h5t.check_string_dtype(dtype)
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5a.pyx in h5py.h5a.AttrID.read()
h5py/_proxy.pyx in h5py._proxy.attr_rw()
h5py/_conv.pyx in h5py._conv.vlen2ndarray()
h5py/_conv.pyx in h5py._conv.conv_vlen2ndarray()
ValueError: data type must provide an itemsize
Similar results are obtained trying to read it with f.attrs.get('test')
, f.attrs.items()
, and f.attrs.values()
.
I also can’t read it at a more level by
>>> id = f.attrs.get_id('test')
>>> b = numpy.zeros(id.shape, dtype=id.dtype)
>>> id.read(b)
and get the same error (probably because those other methods do this under the hood).
For reference, I ran h5dump data.h5
on the file and got the following result
HDF5 "data.h5" {
GROUP "/" {
ATTRIBUTE "test" {
DATATYPE H5T_VLEN { H5T_STRING {
STRSIZE 1;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): ("a", "b")
}
}
}
}
I do not know if this is a bug or an intentional feature drop or something else. If it is a bug and is fixed in the next release of h5py, I still need a workaround way to read this in the versions that have this problem since I need to support all h5py versions 2.3 to present with full functionality. It is almost surely readable using the lowlevel libhdf5 bindings in h5py, but to be honest I am not sure where to start and how stable those bindings are. How would one read this kind of Attribute in h5py 3.x in its present state?
Issue Analytics
- State:
- Created 3 years ago
- Comments:19 (13 by maintainers)
Thank you. The release fixed it.
By the way, found out from someone that the workaround I posted earlier segfaults on 32-bit systems. I think it boiled down to the difference in size of the pointers in the file and the system not being taken into account. Anyhow, the following version doesn’t segfault on 32 bit or 64 bit little-endian (who knows about big-endian) and also doesn’t make the size_t be read as an intp.
No need to re-open this again, but I made a slightly improved version for anyone else reading this who needs the work around. This version avoids making any intermediate copies and instead allocates the
ndarray
first and usesctypes.memmove
to copy the data directly into it. So there is a much lower chance of a memory leak.Thanks again.