Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Could dask-mpi run the client script too?

See original GitHub issue

I’ve been dealing with an issue that…well, I was convinced shouldn’t be an issue, so I never said anything about it until dask/dask-blog#5. And after a discussion with @guillaumeeb, I was convinced that maybe I’m not as crazy (or as ill-informed) as I thought I was. So, here’s the issue…

I’ve been trying to figure out a way of launching the Dask Scheduler, Workers, and the Client script in the same MPI environment. Currently, the way dask-mpi works is that the Scheduler and the Workers are started, and you separately connect your client (in your separate script) to the Scheduler via, for example, the scheduler.json file.

I discussed with @guillaumeeb one approach that should work, something like the following:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] &
python my_dask_script.py

However, this launches Scheduler/Worker processes on all N allocated MPI processes, and then the python my_dask_script.py process could, potentially, run on the same process as the Scheduler, for example. If you have a compute-intensive client script, this could be problematic.

What I was originally hoping for was a solution that allowed something more like this:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] --script my_dask_script

But after thinking about it for a while, I found that what I really wanted was something that worked like this:

# [PBS header info requesting N MPI processes]

mpirun -np N python my_dask_mpi_script.py

where the my_dask_mpi_script.py has something like an import dask-mpi line that does the following:

let’s MPI rank 0 pass through,
launches the Scheduler on MPI rank 1 and runs an IOLoop until the rank 0 process is complete,
launches the Workers on MPI ranks >1, which also run until rank 0 process is complete.

At this point, I feel like I could write this myself…except that I don’t know how to implement the “run the IOLoop until rank 0 process is complete” part.

Any thoughts? Are there different solutions? Would you recommend something different?

Issue Analytics

State:
Created 5 years ago
Reactions:4
Comments:14 (7 by maintainers)

Top GitHub Comments

2reactions

mrocklincommented, Dec 12, 2018

I’ve made https://github.com/dask/dask-mpi and given @kmpaul @guillaumeeb and @andersy005 write permissions.

0reactions

kmpaulcommented, Dec 27, 2018

This has now been completed in https://github.com/dask/dask-mpi with dask/dask-mpi#6. The PR implements the “functional initialization” enhancement and the “pulling dask-mpi out of the [distributed] codebase” request.

I will leave it to other dask developers to remove the dask-mpi code from distributed as they see fit.