Example for write_direct_chunk with gzip compression?
See original GitHub issueBased on this nice example using write_direct_chunk
with lz4 compression, I tried to do the same, but with gzip compression. That is, first compressing chunks with gzip, and then directly writing them to a h5 dataset.
It works fine without compression:
import h5py
import numpy
import struct
f = h5py.File("direct_chunk.h5", "w")
block_size = 2048
dataset = f.create_dataset("data",
(256, 1024, 1024),
dtype="uint16",
chunks=(64, 128, 128),
)
array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + array.tobytes()
dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)
f.close()
But when I try to use gzip compression like so
import h5py
import numpy
import struct
import gzip
f = h5py.File("direct_chunk.h5", "w")
block_size = 2048
dataset = f.create_dataset("data",
(256, 1024, 1024),
dtype="uint16",
chunks=(64, 128, 128),
compression="gzip",
compression_opts=4,
)
array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
compressed = gzip.compress(array, compresslevel=4)
# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + compressed
dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)
f.close()
there are no errors when executing this Python code. But when I open the resulting h5 file with Fiji, I get an error:
Data filters: Unable to initialize object ["..\..\src\H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed"
which I guess means that how I do the gzip compression is incompatible with the compression options given to create_dataset
?
I didn’t find an example for write_direct_chunk
plus gzip compression. So I would really appreciate if someone could provide a working example for this. Thanks a lot!
Operating System: Windows 10
Python version: 3.8 (Anaconda)
print(h5py.version.info)
:
h5py 2.10.0
HDF5 1.10.5
Python 3.8.1 [MSC v.1916 64 bit (AMD64)]
sys.platform win32
numpy 1.18.2
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
I believe
level=0
should pass the data through unchanged, so see if your compressed data matches what you put in. Beyond that, maybe try writing the same chunk and letting HDF5 do the compression, then find the chunk in the resulting file and compare it to your compressed data (e.g. to see if one has a prefix that the other doesn’t).I just tried this out, and it appears that the ‘header information’ which the bitshuffle-lz4 example adds is not needed here - the raw output from
zlib.compress()
is what has to be passed towrite_direct_chunk
. It looks like those 12 bytes of headers are specific to the bitshuffle+lz4 filter, not a general thing for HDF5 filters.Here’s a working version of the script (which I’ll also add to the examples folder):