Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cluster keeps appending "interface" flag to job script

See original GitHub issue

When I run the following code snippet:

from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(memory="100GB", cores=40, interface="ib0")

for _ in range(20):
    _ = cluster.job_script()
    
print(cluster.job_script())

I get the following output:

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=40
#SBATCH --mem=94G
#SBATCH -t 00:30:00

/path/to/python -m distributed.cli.dask_worker tcp://xx.xx.xx.xx:pppp --nthreads 5 --nworkers 8 --memory-limit 11.64GiB --name dummy-name --nanny --death-timeout 60 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0 --interface ib0

It seems everytime the job script method is called, the interface flag is appended to it. I am running dask-jobqueue==0.8.0 off the conda-forge channel. (Python 3.10 if relevant)

You can replicate this with PBSCluster as well. The issue seems to be in how worker_extra_args are handled in the __init__() method of dask_jobqueue.core.Job

Issue Analytics

State:
Created a year ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

guillaumeebcommented, Oct 3, 2022

Nice catch and many thanks @jolange.

0reactions

jolangecommented, Oct 3, 2022

Ok, this is a subtle combination of two things:

The one you noted: The config is assigned to worker_extra_args: https://github.com/dask/dask-jobqueue/blob/f79f9136542abd86566e0a36f7370c144052ee9d/dask_jobqueue/core.py#L226-L228
In the renaming, I changed from X = X + NEW to X += NEW:

         if interface:
-            extra = extra + ["--interface", interface]
+            worker_extra_args += ["--interface", interface]

This changes the config, whereas before worker_extra_args (or extra then) was newly assigned. I will revert that!

Top Results From Across the Web

Cluster Execution — Snakemake 7.19.1 documentation

Consider using the --default-resources and --set-resources flags to define such resources on the command line. Additional custom job configuration¶. SLURM ...

How To Submit Parallel Jobs in O2 - Atlassian

The same job submission can be done using a script to pass slurm's flags and commands to be executed, see the main docs...

Job Scripts | Ohio Supercomputer Center

The sbatch flag --signal can be used to specify commands to be ran when these signals are received by the job. It is...

Cluster User Guide - Google Sites

Job partitions (job submission queues). Connecting to the cluster. Command-line access via SSH. Using the Open OnDemand interface. Using the FastX interface.

Configuring and managing high availability clusters Red Hat ...

The High Availability Add-On features two configuration tools for cluster deployment, monitoring, and management. pcs. The pcs command line interface ...