regression in h5py 3.4.0: fletcher32 filter on variable length strings dataset
See original GitHub issueSummary of the h5py configuration:
h5py 3.4.0 HDF5 1.12.1 Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] sys.platform linux (I am on Ubuntu 20.04.2 LTS) sys.maxsize 9223372036854775807 numpy 1.19.5 cython (built with) 0.29.24 numpy (built against) 1.17.5 HDF5 (built against) 1.12.1
The following code works with h5py<3.4.0:
import h5py
dt = h5py.special_dtype(vlen=str)
with h5py.File("test.h5", mode="w") as h5:
log_dset = h5.create_dataset("peter",
(10,),
dtype=dt,
maxshape=(None,),
chunks=True,
fletcher32=True,
compression="gzip")
With h5py 3.4.0, I get the error:
Traceback (most recent call last):
File "test.py", line 6, in <module>
log_dset = h5.create_dataset("peter",
File "/home/paul/repos/dclab/.env/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/home/paul/repos/dclab/.env/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 137, in make_new_dset
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 87, in h5py.h5d.create
ValueError: Unable to create dataset (not suitable for filters)
The error goes away when I remove fletcher32=True
. But I would like to have that extra check, so this looks like a regression to me.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Fletcher32 filter on variable length string datasets (not suitable ...
I am getting this “not suitable for filters” error when working with variable length string datasets since the h5py 3.4.0 release.
Read more >Strings in HDF5 — h5py 3.7.0 documentation
String data in HDF5 datasets is read as bytes by default: bytes objects for variable-length strings, or numpy bytes arrays ( 'S' dtypes)...
Read more >Accessing Fletcher-32 checksum in HDF5 file - Stack Overflow
Suppose I want to check that a particular H5 file is the one I think it is, and hasn't had some dataset altered...
Read more >writing to compound dataset with variable length string via ...
Any pointers on what might be the issue? Thanks.,I am trying to write a compound type that contains a variable length string as...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m hesitant to try to be clever from the h5py side. If we add a check and raise an error before creating the dataset, and then a future version of HDF5 makes checksumming vlen data valid, then that’s a bug in h5py. And automatically diagnosing errors after the fact is hard.
There are plenty of errors where the message we get from HDF5 is not especially clear or specific (this example is pretty clear compared to some). I’d rather not set a precedent that h5py should be trying to intercept them and provide better error messages, because a) that’s a mammoth task, and b) it sounds like a bug minefield.
I created a thread at the HDF forum: https://forum.hdfgroup.org/t/fletcher32-filter-on-variable-length-string-datasets-not-suitable-for-filters/9038