question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unusual vlen type Attribute can be created but not read.

See original GitHub issue

First, I’ve done my testing in the following setup

h5py    3.1.0
HDF5    1.12.0
Python  3.9.1 (default, Jan 20 2021, 00:00:00) 
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.20.0
cython (built with) 0.29.21
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0

However, other people using the package I wrote that has the unusual Attribute have been reporting the problem in h5py 3.0.0. I’ve tested the same code with 2.10.0 just earlier today with no problem and have been using the Attribute successfully in every h5py 2.x.y version since 2.3 (first one where this Attribute was even possible to create, since 2.2 did not support writing it).

The Attribute is an array of vlens of type numpy.dtype(‘S1’).

The following code makes and writes the Attribute, but then when it reads it afterwards gets an error

>>> import numpy, h5py
>>> dt = h5py.vlen_dtype(numpy.dtype('S1'))
>>> a = numpy.empty((1, ), dtype=dt)
>>> a[0] = numpy.array([b'a', b'b'], dtype='S1')
>>> f = h5py.File('data.h5', mode='a')
>>> f.attrs.create('test', a)
>>> f.attrs['test']

with the following output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-b22c92ac9bc4> in <module>
      5 f = h5py.File('data.h5', mode='a')
      6 f.attrs.create('test', a)
----> 7 f.attrs['test']

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.local/lib/python3.9/site-packages/h5py/_hl/attrs.py in __getitem__(self, name)                                                                               
     75 
     76         arr = numpy.ndarray(shape, dtype=dtype, order='C')
---> 77         attr.read(arr, mtype=htype)
     78 
     79         string_info = h5t.check_string_dtype(dtype)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5a.pyx in h5py.h5a.AttrID.read()

h5py/_proxy.pyx in h5py._proxy.attr_rw()

h5py/_conv.pyx in h5py._conv.vlen2ndarray()

h5py/_conv.pyx in h5py._conv.conv_vlen2ndarray()

ValueError: data type must provide an itemsize

Similar results are obtained trying to read it with f.attrs.get('test'), f.attrs.items(), and f.attrs.values().

I also can’t read it at a more level by

>>> id = f.attrs.get_id('test')
>>> b = numpy.zeros(id.shape, dtype=id.dtype)
>>> id.read(b)

and get the same error (probably because those other methods do this under the hood).

For reference, I ran h5dump data.h5 on the file and got the following result

HDF5 "data.h5" {
GROUP "/" {
   ATTRIBUTE "test" {
      DATATYPE  H5T_VLEN { H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }}
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): ("a", "b")
      }
   }
}
}

I do not know if this is a bug or an intentional feature drop or something else. If it is a bug and is fixed in the next release of h5py, I still need a workaround way to read this in the versions that have this problem since I need to support all h5py versions 2.3 to present with full functionality. It is almost surely readable using the lowlevel libhdf5 bindings in h5py, but to be honest I am not sure where to start and how stable those bindings are. How would one read this kind of Attribute in h5py 3.x in its present state?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:19 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
frejanordsiekcommented, Mar 14, 2021

Thank you. The release fixed it.

By the way, found out from someone that the workaround I posted earlier segfaults on 32-bit systems. I think it boiled down to the difference in size of the pointers in the file and the system not being taken into account. Anyhow, the following version doesn’t segfault on 32 bit or 64 bit little-endian (who knows about big-endian) and also doesn’t make the size_t be read as an intp.

>>> import ctypes
>>> import numpy
>>> import h5py
>>> import h5py._objects
>>>
>>> dt = h5py.vlen_dtype(numpy.dtype('S1'))
>>> a = numpy.empty((2, ), dtype=dt)
>>> a[0] = numpy.array([b'a', b'b'], dtype='S1')
>>> a[1] = numpy.array([b'c', b'd', b'e'], dtype='S1')
>>> f = h5py.File('data.h5', mode='a')
>>> f.attrs.create('test', a)
>>>
>>> dt = numpy.dtype([('length', numpy.uintp), ('pointer', numpy.intp)])
>>> with h5py._objects.phil:
>>>     attr_id = f.attrs.get_id('test')
>>>     raw_buf = numpy.empty(attr_id.shape, dtype=dt)
>>>     attr_id.read(raw_buf, mtype=attr_id.get_type())
>>>     attr = numpy.empty(raw_buf.shape, dtype='object')
>>>     for i in range(len(attr)):
>>>         length = int(raw_buf[i]['length'])
>>>         ptr = int(raw_buf[i]['pointer'])
>>>         attr[i] = numpy.empty(length, dtype='S1')
>>>         ctypes.memmove(attr[i].ctypes.data, ptr, length)
>>> attr
1reaction
frejanordsiekcommented, Feb 23, 2021

No need to re-open this again, but I made a slightly improved version for anyone else reading this who needs the work around. This version avoids making any intermediate copies and instead allocates the ndarray first and uses ctypes.memmove to copy the data directly into it. So there is a much lower chance of a memory leak.

>>> import ctypes
>>> import numpy
>>> import h5py
>>> import h5py._objects
>>>
>>> dt = h5py.vlen_dtype(numpy.dtype('S1'))
>>> a = numpy.empty((2, ), dtype=dt)
>>> a[0] = numpy.array([b'a', b'b'], dtype='S1')
>>> a[1] = numpy.array([b'c', b'd', b'e'], dtype='S1')
>>> f = h5py.File('data.h5', mode='a')
>>> f.attrs.create('test', a)
>>>
>>> dt = numpy.dtype('p')
>>> with h5py._objects.phil:
>>>     attr_id = f.attrs.get_id('test')
>>>     attr_size = attr_id.get_storage_size()
>>>     if attr_size % (2 * dt.itemsize) != 0:
>>>         raise RuntimeError('Data size is not a multiple of {0} bytes.'.format(2 * dt.itemsize))
>>>     raw_buf = numpy.empty(attr_size // dt.itemsize, dtype=dt)
>>>     attr_id.read(raw_buf, mtype=attr_id.get_type())
>>>     attr = numpy.empty(len(raw_buf) // 2, dtype='object')
>>>     for i in range(len(attr)):
>>>         length = int(raw_buf[2 * i])
>>>         ptr = int(raw_buf[(2 * i) + 1])
>>>         attr[i] = numpy.empty(length, dtype='S1')
>>>         ctypes.memmove(attr[i].ctypes.data, ptr, length)
>>> attr

Thanks again.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Strings in HDF5 — h5py 2.10.0 documentation
This is the most-compatible way to store a string. Everything else can read it. Variable-length ASCII¶. These are created when you assign a...
Read more >
NetCDF-Fortran: 5 User Defined Data Types
Read attributes of the new type with NF90_GET_ATT (see section Get Attribute's Values: ... The base typeid will be copied here for vlen...
Read more >
Bluetooth GATT: How to Design Custom Services ... - Novel Bits
BLE is a great technology to use in your IoT device that interfaces with a smartphone. However, designing a Bluetooth GATT can be...
Read more >
netCDF4 API documentation
This module can read and write files in both the new netCDF 4 and the ... (vlen) and enumerated (enum) data types are...
Read more >
PingDirectory and PingDirectoryProxy SCIM 2.0 API Reference
A SCIM resource is always represented as a JSON object. SCIM schemas define a resource's attributes. A resource type has at least one...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found