Could dask-mpi run the client script too?
See original GitHub issueI’ve been dealing with an issue that…well, I was convinced shouldn’t be an issue, so I never said anything about it until dask/dask-blog#5. And after a discussion with @guillaumeeb, I was convinced that maybe I’m not as crazy (or as ill-informed) as I thought I was. So, here’s the issue…
I’ve been trying to figure out a way of launching the Dask Scheduler, Workers, and the Client script in the same MPI environment. Currently, the way dask-mpi
works is that the Scheduler and the Workers are started, and you separately connect your client (in your separate script) to the Scheduler via, for example, the scheduler.json
file.
I discussed with @guillaumeeb one approach that should work, something like the following:
# [PBS header info requesting N MPI processes]
mpirun -np N dask-mpi [dask-mpi options] &
python my_dask_script.py
However, this launches Scheduler/Worker processes on all N allocated MPI processes, and then the python my_dask_script.py
process could, potentially, run on the same process as the Scheduler, for example. If you have a compute-intensive client script, this could be problematic.
What I was originally hoping for was a solution that allowed something more like this:
# [PBS header info requesting N MPI processes]
mpirun -np N dask-mpi [dask-mpi options] --script my_dask_script
But after thinking about it for a while, I found that what I really wanted was something that worked like this:
# [PBS header info requesting N MPI processes]
mpirun -np N python my_dask_mpi_script.py
where the my_dask_mpi_script.py
has something like an import dask-mpi
line that does the following:
- let’s MPI rank 0 pass through,
- launches the Scheduler on MPI rank 1 and runs an
IOLoop
until the rank 0 process is complete, - launches the Workers on MPI ranks >1, which also run until rank 0 process is complete.
At this point, I feel like I could write this myself…except that I don’t know how to implement the “run the IOLoop
until rank 0 process is complete” part.
Any thoughts? Are there different solutions? Would you recommend something different?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:14 (7 by maintainers)
Top GitHub Comments
I’ve made https://github.com/dask/dask-mpi and given @kmpaul @guillaumeeb and @andersy005 write permissions.
This has now been completed in https://github.com/dask/dask-mpi with dask/dask-mpi#6. The PR implements the “functional initialization” enhancement and the “pulling dask-mpi out of the [distributed] codebase” request.
I will leave it to other dask developers to remove the dask-mpi code from distributed as they see fit.