question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example PBS Script

See original GitHub issue

People using Dask on traditional job schedulers often depend on PBS scripts. It would be useful to include a plain example in the documentation that users to download, modify, and run.

What we do now

Currently we point users to the setup network docs, and in particular the section about using job schedulers with a shared network file system. The instructions there suggest that users submit two jobs, one for the scheduler and one for the workers:

# Start a dask-scheduler somewhere and write connection information to file
qsub -b y /path/to/dask-scheduler --scheduler-file /path/to/scheduler.json

# Start 100 dask-worker processes in an array job pointing to the same file
qsub -b y -t 1-100 /path/to/dask-worker --scheduler-file /path/to/scheduler.json

However this is flawed because the scheduler or workers may start and run independently from each other. It would be better to place them into a single job, where one special node is told to be the dask-scheduler process and all other nodes are told to be the dask-worker processes. Additionally we would like to offer some guidance on tuning the number of CPUs and pointing workers to use local high-speed scratch disk if available.

PBS script options

Many docs on PBS scripts exist online, but each seems to be made by an IT group at a separate super-computer. It is difficult to tease out what is general to all systems and what is specific to a single supercomputer or job scheduler. After reading from a number of pages I’ve cobbled together the following example.

#!/bin/bash -login
# Configure these values to change the size of your dask cluster
#PBS -t 1-9                 # Nine nodes.  One scheduler and eight workers
#PBS -l ncpus=4             # Four cores per node.
#PBS -l mem=20GB            # 20 GB of memory per node
#PBS -l walltime=01:00:00   # will run for at most one hour

# Environment variables
export OMP_NUM_THREADS=1

# Write ~/scheduler.json file in home directory
# connect with
# >>> from dask.distributed import Client
# >>> client = Client(scheduler_file='~/scheduler.json')

# Start scheduler on first process, workers on all others
if [[ $PBS_ARRAYID == '1' ]]; then
    dask-scheduler --scheduler-file $HOME/scheduler.json;
else
    dask-worker
    --scheduler-file $HOME/scheduler.json \   
    --nthreads $PBS_NUM_PPN \
    --local-directory $TMPDIR \
    --name worker-$PBS_ARRAYID \
    > $PBS_O_WORKDIR/$PBS_JOBID-$PBS_ARRAYID.out \  
    2> $PBS_O_WORKDIR/$PBS_JOBID-$PBS_ARRAYID.err;  
fi

https://wiki.hpcc.msu.edu/display/hpccdocs/Advanced+Scripting+Using+PBS+Environment+Variables http://www.pbsworks.com/documentation/support/PBSProUserGuide10.4.pdf

Questions

  • What is the difference between ncpus and ppn?
  • How about -t 1-8 and nodes=8?

Does this actually work? I suspect not. I don’t have a convenient testing system for this and would appreciate coverage by a few different groups.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:127 (93 by maintainers)

github_iconTop GitHub Comments

1reaction
davidedelventocommented, Aug 23, 2017

And how about people who are using LSF, Slurm or other queuing systems? I still thinking having your own is more flexible and more robust.

0reactions
mrocklincommented, Jun 21, 2018

This has been resolved by the dask-jobqueue project.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating a PBS script - LATIS Research
A PBS script is a text file that contains the information that PBS needs to set up the job, followed by the commands...
Read more >
PBS Script Guide - PACE Cluster Documentation
A PBS Script is used to submit jobs (computation user wants to be done) to the scheduler. These jobs can then be handled...
Read more >
Sample PBS scripts :: Center for Advanced Research Computing
Sample PBS scripts · PBS Hello World: · Multi-processor example script: · Multi-node example script:.
Read more >
Sample Qsub script - Dartmouth
Create a script file that includes the details of the PBS job that you want to run. It can include the name of...
Read more >
Example PBS Script
This line defines the name of the PBS job. #PBS -j oe. This line tells PBS to combine the output and error output...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found