mpi4py error during getting results (in pare with SLURM)
See original GitHub issueERROR: Traceback (most recent call last): File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py”, line 72, in <module> main() File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py”, line 60, in main run_command_line() File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/run.py”, line 47, in run_command_line run_path(sys.argv[0], run_name=‘main’) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 263, in run_path pkg_name=pkg_name, script_name=fname) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 85, in _run_code exec(code, run_globals) File “cali_send_2.py”, line 137, in <module> globals()[sys.argv[1]](sys.argv[2], sys.argv[3]) File “cali_send_2.py”, line 94, in solve_on_cali sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls)) File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/pool.py”, line 207, in result_iterator yield futures.pop().result() File “/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py”, line 432, in result return self.__get_result() File “/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result raise self._exception UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc1 in position 5: invalid start byte
ENV CentOS release 6.5 (Final) Python 3.6 anaconda mpiexec (OpenRTE) 1.8.2 mpi4py 3.0.3
Piece of Code:
inputs = [der_mats, ref_ind_yee_grid, n_xy_sq, param_sweep_on, i_m, inv_eps, sol_params]
with MPIPoolExecutor(max_workers=int(nodes)) as executor:
sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls))
executor.shutdown(wait=True) # wait for all complete
zipobj = ZipFile(zp_fl_nm, 'w')
for sol in sols:
w, v, solnum, vq = sol
print(w[0], solnum) # this line will shows if data have duplicates.
w.tofile(f"w_sol_{solnum}.npy")
v.tofile(f"v_sol_{solnum}.npy")
vq.tofile(f"vq_sol_{solnum}.npy")
zipobj.write(f"w_sol_{solnum}.npy")
zipobj.write(f"v_sol_{solnum}.npy")
zipobj.write(f"vq_sol_{solnum}.npy")
os.remove(f"w_sol_{solnum}.npy")
os.remove(f"v_sol_{solnum}.npy")
os.remove(f"vq_sol_{solnum}.npy")
Call of method I do with sending command like this:
f'srun --mpi=pmi2 -n ${{SLURM_NTASKS}} python -m mpi4py.futures cali_send_2.py solve_on_cali \"\"{name}\"\" {num_nodes}'
Sometimes this error not appear if I use another range for wls
with (wls = np.arange(0.4e-6, 1.8e-6, 0.01e-6)
) it crush with this error or return duplicates of some solutions if step 0.1e-6.
If I use this range (wls = np.arange(0.55e-6, 1.55e-6, 0.01e-6)
) with any step 0.1e-6 or 0.001e-6 it’s NOT crush with mentioned error and returns good results without duplicates.
Could someone please explain me what is the origin of this error? My suspicion is pointing on float numbers like 1.699999999999999999999e-6
Issue Analytics
- State:
- Created 2 years ago
- Comments:15 (8 by maintainers)
@byquip You are using Python from a miniconda environment, however mpi4py is installed in
$HOME/.local
. That’s suspicious, conda users should justpip install
in the environment. Or perhaps the problem is what @leofang pointed out, the environment is not active in all the compute nodes.This kind of questions is better suited for mpi4py’s mailing list in Google Groups. I understand that shooting an issue in GitHub is very convenient for users, but this increases the load on core developers, and the community watching the mailing list is usually larger. Chaces of getting a good tip and advice are higher on the mailing list.