question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Could dask-mpi run the client script too?

See original GitHub issue

I’ve been dealing with an issue that…well, I was convinced shouldn’t be an issue, so I never said anything about it until dask/dask-blog#5. And after a discussion with @guillaumeeb, I was convinced that maybe I’m not as crazy (or as ill-informed) as I thought I was. So, here’s the issue…

I’ve been trying to figure out a way of launching the Dask Scheduler, Workers, and the Client script in the same MPI environment. Currently, the way dask-mpi works is that the Scheduler and the Workers are started, and you separately connect your client (in your separate script) to the Scheduler via, for example, the scheduler.json file.

I discussed with @guillaumeeb one approach that should work, something like the following:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] &
python my_dask_script.py

However, this launches Scheduler/Worker processes on all N allocated MPI processes, and then the python my_dask_script.py process could, potentially, run on the same process as the Scheduler, for example. If you have a compute-intensive client script, this could be problematic.

What I was originally hoping for was a solution that allowed something more like this:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] --script my_dask_script

But after thinking about it for a while, I found that what I really wanted was something that worked like this:

# [PBS header info requesting N MPI processes]

mpirun -np N python my_dask_mpi_script.py

where the my_dask_mpi_script.py has something like an import dask-mpi line that does the following:

  1. let’s MPI rank 0 pass through,
  2. launches the Scheduler on MPI rank 1 and runs an IOLoop until the rank 0 process is complete,
  3. launches the Workers on MPI ranks >1, which also run until rank 0 process is complete.

At this point, I feel like I could write this myself…except that I don’t know how to implement the “run the IOLoop until rank 0 process is complete” part.

Any thoughts? Are there different solutions? Would you recommend something different?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
mrocklincommented, Dec 12, 2018

I’ve made https://github.com/dask/dask-mpi and given @kmpaul @guillaumeeb and @andersy005 write permissions.

0reactions
kmpaulcommented, Dec 27, 2018

This has now been completed in https://github.com/dask/dask-mpi with dask/dask-mpi#6. The PR implements the “functional initialization” enhancement and the “pulling dask-mpi out of the [distributed] codebase” request.

I will leave it to other dask developers to remove the dask-mpi code from distributed as they see fit.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dask-mpi Documentation
On MPI rank 1, the initialize() function “passes through” to the Client script, running the Dask-based Client code the user wishes to execute....
Read more >
How Dask-MPI Works
When using the initialize() method, Dask-MPI runs the Client script on MPI rank 1 and launches the Workers on the remaining MPI ranks...
Read more >
Dask-MPI — Dask-MPI 2022.4.0+15.g74eab41.dirty ...
The Dask-MPI project makes it easy to deploy Dask from within an existing MPI environment, such as one created with the common MPI...
Read more >
dask_mpi.core.initialize - Dask-MPI
Initialize a Dask cluster using mpi4py. Using mpi4py, MPI rank 0 launches the Scheduler, MPI rank 1 passes through to the client script,...
Read more >
Dask-MPI with Batch Jobs
However, in batch mode, you need the script running your Dask Client to run in the same environment in which your Dask cluster...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found