Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can dask-jobqueue use multi-node jobs (Was: Creating dask-jobqueue cluster from a single job versus multiple jobs)?

See original GitHub issue

This is not strictly an “issue” and more a question about suggested usage so if this question belongs somewhere else, please direct me there!

I’ve been working closely with admins of the NASA Pleiades HPC system on how best to support interactive Dask workflows on that system. Pleiades uses PBS. Thus far, my workflow has been to configure a cluster in which a single worker uses all available cores and memory on a single node. For example, for a machine that has 12 cores and 48 GB of memory per node, my jobqueue config is the following:

jobqueue:
    pbs:
        cores: 12
        processes: 1
        memory: 48GB
        interface: ib0
        resource_spec: "select=1:ncpus=12"
        walltime: "00:30:00"

I then request 10-20 nodes by starting an equivalent number of jobs, e.g by running cluster.scale(10). A lot of the time, this configuration works well, but one of the problems that has cropped up (and is common to many systems) is high availability of resources, i.e. when the cluster is busy, I may have to wait > 30 minutes for even a single job to start; not so ideal for interactive workflows! The entire Pleiades system is in very high demand so this is often an issue, particularly on the newer, faster processors.

After raising this issue with the HPC staff, one suggested solution was to use a high-availability queue (called “devel”) that allows users to submit only a single job at a time, but has very high availability (i.e. short waiting times). In this case, the suggested pattern would be to submit a single job that requests multiple cores on multiple nodes. For 25 nodes, each with 24 cores and 128 GB of memory, the jobqueue config is:

jobqueue:
  pbs:
    cores: 200
    memory: 2000GB
    processes: 200
    interface: ib0 
    queue: devel
    resource_spec: 'select=25:ncpus=8:model=has'
    walltime: "00:30:00"

Then, the user would call cluster.scale(1) once and be done. This satisfies the 1 job restriction of the high-availability queue, reduces wait times as you have asked for all resources up front, and is also more “friendly” to the scheduler as it does not involve submitting many jobs. This pattern of course does not permit any scaling up or down, but that is a separate issue.

This is quite different than the multi-job workflow I’ve used previously (and the one that seems to be recommended in the dask-jobqueue docs) and I’m trying to wrap my head around whether this makes sens. My main questions are :

Does this configuration pattern make sense in the context of dask-jobqueue and are there any disadvantages (other than scalability)? In the context of dask-jobqueue, the single-job cluster seems like an anti-pattern to me, but I certainly understand why this is preferable from the HPC admin perspective.
Which node is this single Dask worker running on? and
How is work being parallelized across multiple nodes if only a single job is running?

I’ve experimented with the above single-job workflow (and minor variations on it) and have found that computations (which worked just fine in the multi-job context) will lock up and or result in a killed worker. However, it is not entirely clear to my why this is happening.

I apologize for the lengthy post! I’m trying to get a sense of what is the most optimal usage pattern here in the context of many different configuration options and trying to wrap my head around how all of this actually works. Any advice would be extremely helpful!

Issue Analytics

State:
Created 4 years ago
Comments:10 (7 by maintainers)

Top GitHub Comments

1reaction

lestevecommented, Nov 20, 2019

FYI I changed the title to reflect the discussion. Feel free to edit it or suggest a better title!

0reactions

wtbarnescommented, Nov 20, 2019

@guillaumeeb @lesteve Thanks for all of your help on this. I think we have a more clear picture about how to proceed with our cluster configuration on Pleiades. I’m going to close this (again!) as we seemed to have resolved our main issue, but will reopen if we run into more problems.

Top Results From Across the Web

How this works - Dask-Jobqueue

Dask -jobqueue creates a Dask Scheduler in the Python process where the cluster ... specify the characteristics of a single job or a...

Deploy Dask Clusters - Dask documentation

This page describes various ways to set up Dask clusters on different hardware, either locally on your own machine or on a distributed...

Advanced tips and tricks - Dask-Jobqueue

The universe of HPC clusters is extremely diverse, with different job ... submit some jobs) that will tell you the job scheduler is...

Configure Dask-Jobqueue

Cores and Memory¶. These numbers correspond to the size of a single job, which is typically the size of a single node on...

Dask-Jobqueue — Dask-jobqueue 0.8.1+2.gbee0e0c.dirty ...

The Dask-jobqueue project makes it easy to deploy Dask on common job queuing ... PBSCluster() cluster.scale(jobs=10) # Deploy ten single-node jobs from ...