question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

regression in h5py 3.4.0: fletcher32 filter on variable length strings dataset

See original GitHub issue

Summary of the h5py configuration:

h5py 3.4.0 HDF5 1.12.1 Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] sys.platform linux (I am on Ubuntu 20.04.2 LTS) sys.maxsize 9223372036854775807 numpy 1.19.5 cython (built with) 0.29.24 numpy (built against) 1.17.5 HDF5 (built against) 1.12.1

The following code works with h5py<3.4.0:

import h5py

dt = h5py.special_dtype(vlen=str)

with h5py.File("test.h5", mode="w") as h5:
    log_dset = h5.create_dataset("peter",
                                 (10,),
                                 dtype=dt,
                                 maxshape=(None,),
                                 chunks=True,
                                 fletcher32=True,
                                 compression="gzip")

With h5py 3.4.0, I get the error:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    log_dset = h5.create_dataset("peter",
  File "/home/paul/repos/dclab/.env/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/paul/repos/dclab/.env/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 137, in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 87, in h5py.h5d.create
ValueError: Unable to create dataset (not suitable for filters)

The error goes away when I remove fletcher32=True. But I would like to have that extra check, so this looks like a regression to me.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
takluyvercommented, Oct 12, 2021

I’m hesitant to try to be clever from the h5py side. If we add a check and raise an error before creating the dataset, and then a future version of HDF5 makes checksumming vlen data valid, then that’s a bug in h5py. And automatically diagnosing errors after the fact is hard.

There are plenty of errors where the message we get from HDF5 is not especially clear or specific (this example is pretty clear compared to some). I’d rather not set a precedent that h5py should be trying to intercept them and provide better error messages, because a) that’s a mammoth task, and b) it sounds like a bug minefield.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fletcher32 filter on variable length string datasets (not suitable ...
I am getting this “not suitable for filters” error when working with variable length string datasets since the h5py 3.4.0 release.
Read more >
Strings in HDF5 — h5py 3.7.0 documentation
String data in HDF5 datasets is read as bytes by default: bytes objects for variable-length strings, or numpy bytes arrays ( 'S' dtypes)...
Read more >
Accessing Fletcher-32 checksum in HDF5 file - Stack Overflow
Suppose I want to check that a particular H5 file is the one I think it is, and hasn't had some dataset altered...
Read more >
writing to compound dataset with variable length string via ...
Any pointers on what might be the issue? Thanks.,I am trying to write a compound type that contains a variable length string as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found