"Baseline" workers combination with submitted worker jobs
See original GitHub issueI am working on the Princeton Cluster tigercpu
, which presents some challenges to an efficient workflow with dask-jobqueue: Small fast starting jobs are somewhat discouraged, e.g. most of the time the fastest starting jobs have 2 nodes (each with 40 cores) if submitted as batch script. Interactive jobs start faster most of the time.
My current workflow looks something like this:
- Request a single node as interactive job (fast), start a Jupyter notebook, start a SLURMcluster, and ssh to this node.
- Request another interactive job (more resources, usually fastish when I request an interactive session) where I basically manually execute the jobscript created by the SLURMcluster, connecting to the notebook
- Dask away.
This is quite cumbersome. If I substitute 2) by using the actual dask-jobqueue functionality this would be much cleaner, but wait times can be longer.
Is there a way to start a few workers on the same node as the scheduler (created in 1)) directly from the SLURMcluster? I guess this is somewhat a combination of LocalCluster
and SLURMcluster
?
That would be ideal, since it would immediately provide a few baseline workers to explore data and make some preliminary analysis, before the real compute power comes on.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Closing this issue as stale, and should be fixed once we implement #419.
Another solution would be to manually start a worker from inside the Notebook where SLURMCluster has been used.
Awesome. Thanks. And sorry for the radio silence. I am actually not at Princeton anymore, but will keep these tips in mind the next time I work on an HPC!