question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PBS scheduling reports error on the generated PBSCluster job scripts

See original GitHub issue

Hello,

I’ve been trying to invoke dask in data processing work by asking for more memory using dask-jobqueue (v0.7.1) on an HPC cluster that uses PBS scheduling. The following are the steps I used based on this Dask jobqueue Documentation.

from dask_jobqueue import PBSCluster
from dask.distributed import Client

cluster = PBSCluster(cores=24,
                     memory="108GB",
                     queue="internal",
                     project="ERTH0834",
                     resource_spec="select=1:ncpus=24:mem=120GB",
                     walltime="05:00:00",
                     interface="ib0",
                     scheduler_options={"dashboard_address": ":9375"})

cluster.scale(jobs=8)
client = Client(cluster)

After running this script I was expected things to work just fine, but I am getting error message like the following:

Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/asyncio/tasks.py:596> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 32\nCommand:\nqsub /var/tmp/pbs.2985597.sched01/tmpzidjg2zw.sh\nstdout:\n\nstderr:\nqsub: Error: You must request a project with -P <project>.\n\n')>
Traceback (most recent call last):
  File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/asyncio/tasks.py", line 603, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/distributed/deploy/spec.py", line 50, in _
    await self.start()
  File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 310, in start
    out = await self._submit_job(fn)
  File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 293, in _submit_job
    return self._call(shlex.split(self.submit_command) + [script_filename])
  File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 393, in _call
    "stderr:\n{}\n".format(proc.returncode, cmd_str, out, err)
RuntimeError: Command exited with non-zero exit code.
Exit code: 32
Command:
qsub /var/tmp/pbs.2985597.sched01/tmpzidjg2zw.sh
stdout:

stderr:
qsub: Error: You must request a project with -P <project>.

When I print the job script it generates the following scripts where I am getting -A for project key instead of -P as normally required by PBS scheduling.

#!/usr/bin/env bash

#PBS -N dask-worker
#PBS -q internal
#PBS -A ERTH0834
#PBS -l select=1:ncpus=24:mem=120GB
#PBS -l walltime=05:00:00

/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/bin/python -m distributed.cli.dask_worker tcp://172.19.2.39:37047 --nthreads 24 --memory-limit 108.00GB --name name --nanny --death-timeout 60 --interface ib0

I thought there may be a way to update -A with -P, but I am stuck. Please, any help would be greatly appreciated.

Many thanks in advance, Djeutsch

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
ocaisacommented, Aug 14, 2020

That’s not normal for PBS according to http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm (but I don’t use PBS, so maybe this info is outdated?).

You can always use the job-extra kwarg to set unusual/alternative parameters, in this case

job-extra=['-P ERTH0834'],

should hopefully do the job.

0reactions
guillaumeebcommented, Sep 3, 2021

Closing as stalled. @Djeutsch feel free to reopen if you’ve got something to add.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PP-337: Multiple schedulers servicing the PBS cluster
Using multiple schedulers to address this issue can allow for different scheduling policies and quicker turnaround time for large number of jobs ......
Read more >
PB Portable Batch System S
PBS consist of four major components: commands, the job Server, the job executor, and the job Scheduler. A brief description of each is...
Read more >
PBS Professional 2022.1 Reference Guide - Documentation
Reservation created for a specific queued job, for the same resources ... PBS reports the number and type of licenses available, as well...
Read more >
Portable Batch System
PBS consist of four major components: commands, the job Server, the job executor, and the job Scheduler. A brief description of each is...
Read more >
Job Scheduling with PBS Pro - YouTube
CISL's Consulting Services Group presented this tutorial on November 9, 2021, to introduce new Cheyenne and Casper users to the PBS Pro job...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found