h5repack method
See original GitHub issueProblem: I need to delete specific groups from a file, which contain large datasets. For that, I’m doing:
with h5py.File(‘file_name’, 'a') as f:
del f['group_name']
This indeed removes the group, but the file size remains unchanged.
In scattered answers online, I found a possible solution would be using h5repack. I can’t find, however, if this method is implemented anywhere in h5py. I’m opening this issue in the hope that it’s a reasonable request.
- Operating System: Windows 10
- Python version 3.7
- Anaconda 3
- h5py version 2.9.0
- HDF5 version 1.10.4
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
h5repack (1) - Linux Man Pages
h5repack is a command line tool that applies HDF5 filters to a input file file1, saving the output in a new file, file2....
Read more >Compression of existing file using h5py - Stack Overflow
You can compress the existing hdf5 file using the h5repack utility. You can also change the chunk size using the same utility.
Read more >Is there a way to run h5repack in parallel - HDF5 - HDF Forum
Hello. running h5repack -f GZIP=6 file1 file2 only uses one thread, is there a way to speed up the repacking? Best regards. Peter....
Read more >Using H5Z-ZFP Plugin with H5Repack
A convenient way to use and play with the ZFP filter is a plugin with the HDF5 h5repack utility using the -f filter...
Read more >h5repack-help.txt
usage: h5repack [OPTIONS] file1 file2 file1 Input HDF5 File file2 Output HDF5 ... in 2-32 and coding method is either EC or NN...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Pytables has repack functionality that I think you can import. Also, PRs welcome here. Also, note that repack creates a new, smaller file. It doesn’t change the size of the existing file.
Automatically calling
h5repack
on close is not something safe to do. If you remove something small from the start of a very big file, you could incur a huge amount of I/O as you re-write multiple gigabytes of data to reclaim a few kb.@danschef I think we would happily merge a PR to the docs adding the text that would have saved you time/confusion!