PBS scheduling reports error on the generated PBSCluster job scripts
See original GitHub issueHello,
I’ve been trying to invoke dask in data processing work by asking for more memory using dask-jobqueue
(v0.7.1) on an HPC cluster that uses PBS scheduling. The following are the steps I used based on this Dask jobqueue Documentation.
from dask_jobqueue import PBSCluster
from dask.distributed import Client
cluster = PBSCluster(cores=24,
memory="108GB",
queue="internal",
project="ERTH0834",
resource_spec="select=1:ncpus=24:mem=120GB",
walltime="05:00:00",
interface="ib0",
scheduler_options={"dashboard_address": ":9375"})
cluster.scale(jobs=8)
client = Client(cluster)
After running this script I was expected things to work just fine, but I am getting error message like the following:
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/asyncio/tasks.py:596> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 32\nCommand:\nqsub /var/tmp/pbs.2985597.sched01/tmpzidjg2zw.sh\nstdout:\n\nstderr:\nqsub: Error: You must request a project with -P <project>.\n\n')>
Traceback (most recent call last):
File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/asyncio/tasks.py", line 603, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/distributed/deploy/spec.py", line 50, in _
await self.start()
File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 310, in start
out = await self._submit_job(fn)
File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 293, in _submit_job
return self._call(shlex.split(self.submit_command) + [script_filename])
File "/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/lib/python3.7/site-packages/dask_jobqueue/core.py", line 393, in _call
"stderr:\n{}\n".format(proc.returncode, cmd_str, out, err)
RuntimeError: Command exited with non-zero exit code.
Exit code: 32
Command:
qsub /var/tmp/pbs.2985597.sched01/tmpzidjg2zw.sh
stdout:
stderr:
qsub: Error: You must request a project with -P <project>.
When I print the job script it generates the following scripts where I am getting -A for project key instead of -P as normally required by PBS scheduling.
#!/usr/bin/env bash
#PBS -N dask-worker
#PBS -q internal
#PBS -A ERTH0834
#PBS -l select=1:ncpus=24:mem=120GB
#PBS -l walltime=05:00:00
/home/ldjeutchouang/scratch/CONDA/envs/researchPy37/bin/python -m distributed.cli.dask_worker tcp://172.19.2.39:37047 --nthreads 24 --memory-limit 108.00GB --name name --nanny --death-timeout 60 --interface ib0
I thought there may be a way to update -A with -P, but I am stuck. Please, any help would be greatly appreciated.
Many thanks in advance,
Djeutsch
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
PP-337: Multiple schedulers servicing the PBS cluster
Using multiple schedulers to address this issue can allow for different scheduling policies and quicker turnaround time for large number of jobs ......
Read more >PB Portable Batch System S
PBS consist of four major components: commands, the job Server, the job executor, and the job Scheduler. A brief description of each is...
Read more >PBS Professional 2022.1 Reference Guide - Documentation
Reservation created for a specific queued job, for the same resources ... PBS reports the number and type of licenses available, as well...
Read more >Portable Batch System
PBS consist of four major components: commands, the job Server, the job executor, and the job Scheduler. A brief description of each is...
Read more >Job Scheduling with PBS Pro - YouTube
CISL's Consulting Services Group presented this tutorial on November 9, 2021, to introduce new Cheyenne and Casper users to the PBS Pro job...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That’s not normal for PBS according to http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm (but I don’t use PBS, so maybe this info is outdated?).
You can always use the
job-extra
kwarg to set unusual/alternative parameters, in this caseshould hopefully do the job.
Closing as stalled. @Djeutsch feel free to reopen if you’ve got something to add.