Gatherv problems conversion error and receive buffer size segfaults
See original GitHub issue if mpicomm.rank == 0:
F = np.empty(sum(sendcounts), dtype=float)
else:
F = None
mpicomm.comm.Gatherv(sendbuf=local_F, recvbuf=(F, sendcounts), root=0)
The data that is exchanged from 1 proc (in case of 128 processors) is roughly 34 MB.
- Gatherv problem in 1 processor case:
OverflowError: value too large to convert to int
Traceback (most recent call last):
File "/data/backup/ARR1905/alsalihi/venv_AARR1905/Test_spacepartitioning/Case20/SMARTA/O2-V2-Huge/SMARTA.py", line 120, in <module>
F = view_factors(mpicomm, universe)
File "/data/backup/ARR1905/alsalihi/venv_AARR1905/Test_spacepartitioning/Case20/SMARTA/O2-V2-Huge/rarfunc.py", line 84, in view_factors
mpicomm.comm.Gatherv(sendbuf=local_F, recvbuf=(F, sendcounts), root=0)
File "mpi4py/MPI/Comm.pyx", line 601, in mpi4py.MPI.Comm.Gatherv
File "mpi4py/MPI/msgbuffer.pxi", line 506, in mpi4py.MPI._p_msg_cco.for_gather
File "mpi4py/MPI/msgbuffer.pxi", line 456, in mpi4py.MPI._p_msg_cco.for_cco_recv
File "mpi4py/MPI/msgbuffer.pxi", line 300, in mpi4py.MPI.message_vector
File "mpi4py/MPI/asarray.pxi", line 22, in mpi4py.MPI.chkarray
File "mpi4py/MPI/asarray.pxi", line 15, in mpi4py.MPI.getarray
OverflowError: value too large to convert to int
- Similar problem in parallel:
[node13.fk.private.vki.eu:20373] Read -1, expected 270734080, errno = 14
[node13:20373:0:20373] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7f4771e93820)
==== backtrace ====
0 /lib64/libucs.so.0(+0x1b25f) [0x7f53d7ac025f]
1 /lib64/libucs.so.0(+0x1b42a) [0x7f53d7ac042a]
2 /lib64/libc.so.6(+0x15f396) [0x7f53fda7e396]
3 /software/alternate/fk/openmpi/4.0.2/lib/libopen-pal.so.40(opal_convertor_unpack+0x85) [0x7f53dda64895]
4 /software/alternate/fk/openmpi/4.0.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x1a7) [0x7f53d7fe4ed7]
5 /software/alternate/fk/openmpi/4.0.2/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x7f) [0x7f53d7ff6a7f]
6 /software/alternate/fk/openmpi/4.0.2/lib/openmpi/mca_btl_vader.so(+0x4d77) [0x7f53d7ff6d77]
7 /software/alternate/fk/openmpi/4.0.2/lib/libopen-pal.so.40(opal_progress+0x2c) [0x7f53dda53f3c]
8 /software/alternate/fk/openmpi/4.0.2/lib/libopen-pal.so.40(ompi_sync_wait_mt+0xb5) [0x7f53dda5a585]
9 /software/alternate/fk/openmpi/4.0.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x803) [0x7f53d7fd7ba3]
10 /software/alternate/fk/openmpi/4.0.2/lib/openmpi/mca_coll_basic.so(mca_coll_basic_gatherv_intra+0x18a) [0x7f53d7fc383a]
11 /software/alternate/fk/openmpi/4.0.2/lib/libmpi.so.40(MPI_Gatherv+0xf0) [0x7f53ddc54140]
12 /data/backup/ARR1905/alsalihi/venv_AARR1905/lib64/python3.7/site-packages/mpi4py/MPI.cpython-37m-x86_64-linux-gnu.so(+0x136939) [0x7f53dde38939]
13 /lib64/libpython3.7m.so.1.0(_PyMethodDef_RawFastCallKeywords+0x334) [0x7f53fd6e0154]
14 /lib64/libpython3.7m.so.1.0(_PyCFunction_FastCallKeywords+0x23) [0x7f53fd6e01b3]
15 /lib64/libpython3.7m.so.1.0(+0x140473) [0x7f53fd712473]
16 /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0x192e) [0x7f53fd74913e]
17 /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7f53fd6ff7e0]
18 /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x2a2) [0x7f53fd700822]
19 /lib64/libpython3.7m.so.1.0(+0x14035f) [0x7f53fd71235f]
20 /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xb5a) [0x7f53fd74836a]
21 /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7f53fd6ff7e0]
22 /lib64/libpython3.7m.so.1.0(PyEval_EvalCodeEx+0x39) [0x7f53fd700579]
23 /lib64/libpython3.7m.so.1.0(PyEval_EvalCode+0x1b) [0x7f53fd78fccb]
24 /lib64/libpython3.7m.so.1.0(+0x1ffc63) [0x7f53fd7d1c63]
25 /lib64/libpython3.7m.so.1.0(PyRun_FileExFlags+0x97) [0x7f53fd7d21d7]
26 /lib64/libpython3.7m.so.1.0(PyRun_SimpleFileExFlags+0x19a) [0x7f53fd7d893a]
27 /lib64/libpython3.7m.so.1.0(+0x208701) [0x7f53fd7da701]
28 /lib64/libpython3.7m.so.1.0(_Py_UnixMain+0x3c) [0x7f53fd7da8ac]
29 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f53fd942f43]
30 python3(_start+0x2e) [0x557f8aedd08e]
===================
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
c++ - Segmentation Fault with MPI_Gather - Stack Overflow
I've allocated a send buffer and receive buffer like in the example, could anybody shed some light on why I might be getting...
Read more >Gatherv seg fault? - Google Groups
Hi,. We're seeing segfaults only when the data volumes get large with Gatherv, running on RHEL7 with openmpi 1.8.8 and mpi4py 1.3.1. The...
Read more >Exploring Buffer Overflows in C, Part Two: The Exploit | Tallan
A segmentation fault is an error thrown when a program tries to access restricted memory. The only thing that changed between the first...
Read more >Avoiding the Top 10 NGINX Configuration Mistakes
Errors include insufficient file descriptors per worker, disabling proxy buffering, and not using upstream groups and keepalive connections.
Read more >NVIDIA Deep Learning TensorRT Documentation
This can often solve TensorRT conversion issues in the ONNX parser and ... and attach it to an API object to receive errors...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Did you read my previous comment? For N=60,000, you have a total of 3.6G (G = 1 billion) elements, and that’s above the MPI 32bit limits of around 2.1G elements. You cannot communicate such a large array with a single communication call, you have to somehow chunk it. Again, this is not mpi4py’s fault of laziness, it is a limitation of MPI that has not yet been officially addressed by the standard.
Another, perhaps easier way to implement chunking is to use user-defined datatypes. Look at @jeffhammond’s BigMPI, all of the ideas and tricks in there are easy to implement with mpi4py.
MPI 4.0 will have large count support (second vote shown in https://www.mpi-forum.org/meetings/2020/09/votes), although implementation work is still in-progress (https://github.com/pmodels/mpich/issues/4880).