Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask with jobqueue not using multiple nodes

See original GitHub issue

I am trying to use Dask to do parallel processing on multiple nodes on supercomputing resources - yet the Dask-distributed map only takes advantage of one of the nodes. Note that I put this up on stackoverflow but didn’t get attention so now I’m giving here a go.

Here is a test script I am using to set up the client and perform a simple operation:

import time
from distributed import Client
from dask_jobqueue import SLURMCluster
from socket import gethostname


def slow_increment(x):
    time.sleep(10)
    return [x + 1, gethostname(), time.time()]


cluster = SLURMCluster(
    queue='somequeue',
    cores=2,
    memory='128GB',
    project='someproject',
    walltime='00:05:00',
    job_extra=['-o myjob.%j.%N.out',
               '-e myjob.%j.%N.error'],
    env_extra=['export I_MPI_FABRICS=dapl',
               'source activate dask-jobqueue'])

cluster.scale(2)

client = Client(cluster)

A = client.map(slow_increment, range(8))
B = client.gather(A)

print(client)

for res in B:
    print(res)

client.close()

And here is the output:

<Client: scheduler='tcp://someip' processes=2 cores=4>
[1, 'bdw-0478', 1540477582.6744401]
[2, 'bdw-0478', 1540477582.67487]
[3, 'bdw-0478', 1540477592.68666]
[4, 'bdw-0478', 1540477592.6879778]
[5, 'bdw-0478', 1540477602.6986163]
[6, 'bdw-0478', 1540477602.6997452]
[7, 'bdw-0478', 1540477612.7100565]
[8, 'bdw-0478', 1540477612.711296]

While printing out the client info indicates that Dask has the correct number of nodes (processes) and tasks per node (cores), the socket.gethostname() output and time-stamps indicate that the second node isn’t used. I do know that dask-jobqueue successfully requested two nodes, and that both jobs complete at the same time. I tried using different MPI Fabrics for inter- and intra-node communication (e.g. tcp, shm:tcp, shm:ofa, ofa, ofi, dapl) but this did not change the result. I also tried removing the “export I_MPI_FABRICS” command and using the “interface” option, but this caused the code to hang.

Thanks in advance for any assistance.

-Noah

Issue Analytics

State:
Created 5 years ago
Comments:58 (32 by maintainers)

Top GitHub Comments

1reaction

ocaisacommented, Nov 26, 2022

@appassionate We created a library that does what you are asking for but I admit it requires quite a bit of configuration since you need to tell it about the system and how you launch MPI jobs there. The library is at https://github.com/E-CAM/jobqueue_features and there’s a tutorial at https://github.com/E-CAM/jobqueue_features_workshop_materials (and you can find a recording of the tutorial at https://www.youtube.com/watch?v=FpMua8iJeTk&ab_channel=E-CAM).

~~I haven’t touched it in a few months, need to check if our CI is still passing.~~ That package is working with the latest version of jobqueue (0.8.1)

0reactions

appassionatecommented, Dec 2, 2022

@appassionate, with your comment, it is not really clear to me what you’re trying to achieve.

Anyway, if @ocaisa answer suits you, this is perfect, if not, I encourage you to open a new issue and try to make your issue a bit clearer.

Thanks for your suggestion! jobqueue_features have customized the “SlurmCluster” for some MPI using, i believe there will be some using in such as “more nodes” in slurm which is suitable for me.