Error trying to do chunked, compressed parallel write with MPI
See original GitHub issueI am trying to do a chunked, compressed parallel write using h5py. My test script is a slightly modified version of the one from the docs:
from mpi4py import MPI
import h5py
import numpy as np
rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run)
f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)
dset = f.create_dataset('test', (4, 1000), dtype='i',
chunks=(1, 1000), compression="gzip")
dset[rank] = np.full(1000, rank)
f.close()
I run this via mpirun -np 4 python basic_hdf_write.py
. This produces the error:
_frozen_importlib:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 88 from C header, got 96 from PyObject
Traceback (most recent call last):
File "basic_hdf_write.py", line 11, in <module>
dset[rank] = np.full(1000, rank)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/rigel/ocp/users/ra2697/conda/envs/hdf5_zarr/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 708, in __setitem__
self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw
File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite
OSError: Can't write data (Can't perform independent write with filters in pipeline.
The following caused a break from collective I/O:
Local causes: independent I/O was requested; datatype conversions were required
Global causes: independent I/O was requested; datatype conversions were required)
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (can't close file, there are objects still open)
Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (can't close file, there are objects still open)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[25903,1],0]
Exit code: 1
--------------------------------------------------------------------------
Googline the Can't perform independent write with filters in pipeline
error lead me to the following HDF5 threads:
- https://groups.google.com/forum/#!topic/h5py/bElGpATAwAI
- https://forum.hdfgroup.org/t/parallel-i-o-does-not-support-filters-yet/884 (March '18)
- https://www.hdfgroup.org/2018/04/why-should-i-care-about-the-hdf5-1-10-2-release/ (describes how to use compression with HDF5 parallel applications)
- https://forum.hdfgroup.org/t/compressed-parallel-writing-problem/4979/2 (Oct. '18)
It sounds like compressed, parallel writes should definitely work with HDF5 1.10.4 (the version I am using with h5py). However, the h5py docs don’t give any examples of how to use this feature.
I would appreciate any suggestions you may have.
My h5py was installed from the new conda-forge build with mpi support. I am on Linux.
h5py 2.9.0
HDF5 1.10.4
Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.15.0
I am using Open MPI version 3.1.2.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:12 (2 by maintainers)
Top Results From Across the Web
Crash when writing parallel compressed chunks - HDF Forum
I'm finding crashes when I try to write compressed datasets in parallel with the MPIO driver. I have produced a (fairly simple) test...
Read more >A Case Study on Parallel HDF5 Dataset Concatenation ... - arXiv
For parallel write operations, compressed dataset chunks are assigned to an exclusive owner process, and then partial accesses to the chunk ...
Read more >Iterative Data Write — PyNWB unknown documentation
Defining HDF5 Dataset I/O Settings (chunking, compression, etc.) Iterative Data Write; Modular Data Storage using External Files · Parallel I/O using MPI ......
Read more >A Brief Introduction to Parallel HDF5
56 GB/s I/O rate in writing 5TB data using 5K ... processes to perform I/O to an HDF5 file at ... stream” and...
Read more >MPI-parallel Molecular Dynamics Trajectory Analysis with the ...
calculation on their chunk of data, and then gathered the results back to the ... parallel MPI-IO capable HDF5-based file format trajectory reader....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I can confirm that this error is still present, even with using the collective helper, with HDF5 1.10.5, h5py 2.9.0, and OpenMPI 3.1.4.
Edit 3 – The original does works as long as the offsets are non-zero
It works when offsets are non-zero, I mean, every worker has to write! You cannot not write to the array from a worker.
Edit 2 – It works if the dataset is not being slice heterogenously:
edit 2
Edit it’s not working… I was being silly…
edit
~Just for posterity, I attached a code with an added complexity that is writing irregularly size arrays using offset to the same array. There were a couple of hiccups. This seems to work for me without any problem!~
~Quite necessary was the qualifier~
~without which the program would just hang [In my use case, I only need to write from some processors].~
Attached code
~The
comm.Barrier()
is not really necessary – a remnant of previous iterations.~~It is important to note that I compiled
HDF5
“by hand” using installed clustergcc
, andopenmpi
.~h5pcc -showconfig