question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem: I need to delete specific groups from a file, which contain large datasets. For that, I’m doing:

with h5py.File(‘file_name’, 'a') as f:
    del f['group_name']

This indeed removes the group, but the file size remains unchanged.

In scattered answers online, I found a possible solution would be using h5repack. I can’t find, however, if this method is implemented anywhere in h5py. I’m opening this issue in the hope that it’s a reasonable request.

  • Operating System: Windows 10
  • Python version 3.7
  • Anaconda 3
  • h5py version 2.9.0
  • HDF5 version 1.10.4

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
scopatzcommented, Jul 23, 2019

Pytables has repack functionality that I think you can import. Also, PRs welcome here. Also, note that repack creates a new, smaller file. It doesn’t change the size of the existing file.

0reactions
tacaswellcommented, May 26, 2021

Automatically calling h5repack on close is not something safe to do. If you remove something small from the start of a very big file, you could incur a huge amount of I/O as you re-write multiple gigabytes of data to reclaim a few kb.

@danschef I think we would happily merge a PR to the docs adding the text that would have saved you time/confusion!

Read more comments on GitHub >

github_iconTop Results From Across the Web

h5repack (1) - Linux Man Pages
h5repack is a command line tool that applies HDF5 filters to a input file file1, saving the output in a new file, file2....
Read more >
Compression of existing file using h5py - Stack Overflow
You can compress the existing hdf5 file using the h5repack utility. You can also change the chunk size using the same utility.
Read more >
Is there a way to run h5repack in parallel - HDF5 - HDF Forum
Hello. running h5repack -f GZIP=6 file1 file2 only uses one thread, is there a way to speed up the repacking? Best regards. Peter....
Read more >
Using H5Z-ZFP Plugin with H5Repack
A convenient way to use and play with the ZFP filter is a plugin with the HDF5 h5repack utility using the -f filter...
Read more >
h5repack-help.txt
usage: h5repack [OPTIONS] file1 file2 file1 Input HDF5 File file2 Output HDF5 ... in 2-32 and coding method is either EC or NN...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found