Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example for write_direct_chunk with gzip compression?

See original GitHub issue

Based on this nice example using write_direct_chunk with lz4 compression, I tried to do the same, but with gzip compression. That is, first compressing chunks with gzip, and then directly writing them to a h5 dataset.

It works fine without compression:

import h5py
import numpy
import struct

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset("data",
                               (256, 1024, 1024),
                               dtype="uint16",
                               chunks=(64, 128, 128),
                               )

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")

# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + array.tobytes()

dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)

f.close()

But when I try to use gzip compression like so

import h5py
import numpy
import struct
import gzip

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset("data",
                               (256, 1024, 1024),
                               dtype="uint16",
                               chunks=(64, 128, 128),
                               compression="gzip",
                               compression_opts=4,
                               )

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
compressed = gzip.compress(array, compresslevel=4)

# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + compressed

dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)

f.close()

there are no errors when executing this Python code. But when I open the resulting h5 file with Fiji, I get an error:

Data filters: Unable to initialize object ["..\..\src\H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed"

which I guess means that how I do the gzip compression is incompatible with the compression options given to create_dataset ?

I didn’t find an example for write_direct_chunk plus gzip compression. So I would really appreciate if someone could provide a working example for this. Thanks a lot!

Operating System: Windows 10 Python version: 3.8 (Anaconda) print(h5py.version.info): h5py 2.10.0 HDF5 1.10.5 Python 3.8.1 [MSC v.1916 64 bit (AMD64)] sys.platform win32 numpy 1.18.2

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

takluyvercommented, Apr 14, 2020

I believe level=0 should pass the data through unchanged, so see if your compressed data matches what you put in. Beyond that, maybe try writing the same chunk and letting HDF5 do the compression, then find the chunk in the resulting file and compare it to your compressed data (e.g. to see if one has a prefix that the other doesn’t).

0reactions

takluyvercommented, May 27, 2021

I just tried this out, and it appears that the ‘header information’ which the bitshuffle-lz4 example adds is not needed here - the raw output from zlib.compress() is what has to be passed to write_direct_chunk. It looks like those 12 bytes of headers are specific to the bitshuffle+lz4 filter, not a general thing for HDF5 filters.

Here’s a working version of the script (which I’ll also add to the examples folder):

import h5py
import numpy
import zlib

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset(
    "data", (256, 1024, 1024), dtype="uint16", chunks=(64, 128, 128),
    compression="gzip", compression_opts=4,
)

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
compressed = zlib.compress(array, level=4)

dataset.id.write_direct_chunk((0, 0, 128), compressed)
dataset.id.write_direct_chunk((0, 0, 512), compressed)
dataset.id.write_direct_chunk((0, 512, 512), compressed)

numpy.testing.assert_array_equal(dataset[:64, :128, 128:256], array)

f.close()