question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example for write_direct_chunk with gzip compression?

See original GitHub issue

Based on this nice example using write_direct_chunk with lz4 compression, I tried to do the same, but with gzip compression. That is, first compressing chunks with gzip, and then directly writing them to a h5 dataset.

It works fine without compression:

import h5py
import numpy
import struct

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset("data",
                               (256, 1024, 1024),
                               dtype="uint16",
                               chunks=(64, 128, 128),
                               )

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")

# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + array.tobytes()

dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)

f.close()

But when I try to use gzip compression like so

import h5py
import numpy
import struct
import gzip

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset("data",
                               (256, 1024, 1024),
                               dtype="uint16",
                               chunks=(64, 128, 128),
                               compression="gzip",
                               compression_opts=4,
                               )

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
compressed = gzip.compress(array, compresslevel=4)

# Adding header information according to HDF5 plugin specification
bytes_number_of_elements = struct.pack('>q', (64*128*128*2))
bytes_block_size = struct.pack('>i', block_size*2)
byte_array = bytes_number_of_elements + bytes_block_size + compressed

dataset.id.write_direct_chunk((0, 0, 128), byte_array)
dataset.id.write_direct_chunk((0, 0, 512), byte_array)
dataset.id.write_direct_chunk((0, 512, 512), byte_array)

f.close()

there are no errors when executing this Python code. But when I open the resulting h5 file with Fiji, I get an error:

Data filters: Unable to initialize object ["..\..\src\H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed"

which I guess means that how I do the gzip compression is incompatible with the compression options given to create_dataset ?

I didn’t find an example for write_direct_chunk plus gzip compression. So I would really appreciate if someone could provide a working example for this. Thanks a lot!


Operating System: Windows 10 Python version: 3.8 (Anaconda) print(h5py.version.info): h5py 2.10.0 HDF5 1.10.5 Python 3.8.1 [MSC v.1916 64 bit (AMD64)] sys.platform win32 numpy 1.18.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
takluyvercommented, Apr 14, 2020

I believe level=0 should pass the data through unchanged, so see if your compressed data matches what you put in. Beyond that, maybe try writing the same chunk and letting HDF5 do the compression, then find the chunk in the resulting file and compare it to your compressed data (e.g. to see if one has a prefix that the other doesn’t).

0reactions
takluyvercommented, May 27, 2021

I just tried this out, and it appears that the ‘header information’ which the bitshuffle-lz4 example adds is not needed here - the raw output from zlib.compress() is what has to be passed to write_direct_chunk. It looks like those 12 bytes of headers are specific to the bitshuffle+lz4 filter, not a general thing for HDF5 filters.

Here’s a working version of the script (which I’ll also add to the examples folder):

import h5py
import numpy
import zlib

f = h5py.File("direct_chunk.h5", "w")

block_size = 2048
dataset = f.create_dataset(
    "data", (256, 1024, 1024), dtype="uint16", chunks=(64, 128, 128),
    compression="gzip", compression_opts=4,
)

array = numpy.random.rand(64, 128, 128) * 2000
array = array.astype("uint16")
compressed = zlib.compress(array, level=4)

dataset.id.write_direct_chunk((0, 0, 128), compressed)
dataset.id.write_direct_chunk((0, 0, 512), compressed)
dataset.id.write_direct_chunk((0, 512, 512), compressed)

numpy.testing.assert_array_equal(dataset[:64, :128, 128:256], array)

f.close()

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Enable GZIP Compression for Faster Web Pages
Learn how GZIP compression works to deliver web pages to browsers more quickly, and how to activate it on your web server.
Read more >
How To Optimize Your Site With GZIP Compression
I've set up some pages and a downloadable example: index.html – No explicit compression (on this server, I am using compression by default)....
Read more >
11 Simple Gzip Examples
These gzip examples will show you how to easily compress and decompress data with gzip in 11 useful ways.
Read more >
Gzip compression with Node.js
Compression in Node.js and Express.js decreases the amount of downloadable data from a website or app. By using this compression, we can improve...
Read more >
Java GZIP Example - Compress and Decompress File
Welcome to Java GZIP example. GZIP is one of the favorite tool to compress file in Unix systems. We can compress a single...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found