Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

mpi4py error during getting results (in pare with SLURM)

See original GitHub issue

ERROR: Traceback (most recent call last): File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py”, line 72, in <module> main() File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py”, line 60, in main run_command_line() File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/run.py”, line 47, in run_command_line run_path(sys.argv[0], run_name=‘main’) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 263, in run_path pkg_name=pkg_name, script_name=fname) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File “/opt/software/anaconda/3/lib/python3.6/runpy.py”, line 85, in _run_code exec(code, run_globals) File “cali_send_2.py”, line 137, in <module> globals()[sys.argv[1]](sys.argv[2], sys.argv[3]) File “cali_send_2.py”, line 94, in solve_on_cali sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls)) File “/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/pool.py”, line 207, in result_iterator yield futures.pop().result() File “/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py”, line 432, in result return self.__get_result() File “/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result raise self._exception UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc1 in position 5: invalid start byte

ENV CentOS release 6.5 (Final) Python 3.6 anaconda mpiexec (OpenRTE) 1.8.2 mpi4py 3.0.3

Piece of Code:

inputs = [der_mats, ref_ind_yee_grid, n_xy_sq, param_sweep_on, i_m, inv_eps, sol_params]
with MPIPoolExecutor(max_workers=int(nodes)) as executor:
   sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls))
   executor.shutdown(wait=True)  # wait for all complete
   zipobj = ZipFile(zp_fl_nm, 'w')

   for sol in sols:
      w, v, solnum, vq = sol
      print(w[0], solnum) # this line will shows if data have duplicates.
      w.tofile(f"w_sol_{solnum}.npy")
      v.tofile(f"v_sol_{solnum}.npy")
      vq.tofile(f"vq_sol_{solnum}.npy")
      zipobj.write(f"w_sol_{solnum}.npy")
      zipobj.write(f"v_sol_{solnum}.npy")
      zipobj.write(f"vq_sol_{solnum}.npy")
      os.remove(f"w_sol_{solnum}.npy")
      os.remove(f"v_sol_{solnum}.npy")
      os.remove(f"vq_sol_{solnum}.npy")

Call of method I do with sending command like this: f'srun --mpi=pmi2 -n ${{SLURM_NTASKS}} python -m mpi4py.futures cali_send_2.py solve_on_cali \"\"{name}\"\" {num_nodes}'

Sometimes this error not appear if I use another range for wls with (wls = np.arange(0.4e-6, 1.8e-6, 0.01e-6)) it crush with this error or return duplicates of some solutions if step 0.1e-6. If I use this range (wls = np.arange(0.55e-6, 1.55e-6, 0.01e-6)) with any step 0.1e-6 or 0.001e-6 it’s NOT crush with mentioned error and returns good results without duplicates.

Could someone please explain me what is the origin of this error? My suspicion is pointing on float numbers like 1.699999999999999999999e-6

Issue Analytics

State:
Created 2 years ago
Comments:15 (8 by maintainers)

Top GitHub Comments

1reaction

dalcinlcommented, Apr 7, 2021

@byquip You are using Python from a miniconda environment, however mpi4py is installed in $HOME/.local. That’s suspicious, conda users should just pip install in the environment. Or perhaps the problem is what @leofang pointed out, the environment is not active in all the compute nodes.

1reaction

dalcinlcommented, Apr 7, 2021

This kind of questions is better suited for mpi4py’s mailing list in Google Groups. I understand that shooting an issue in GitHub is very convenient for users, but this increases the load on core developers, and the community watching the mailing list is usually larger. Chaces of getting a good tip and advice are higher on the mailing list.