question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to run toil-cwl-runner on a local HPC with the torque batch system?

See original GitHub issue

I run the following command on the login node:

export TOIL_TORQUE_ARGS="-N graham-rhapsody-wta -e admin:/public/home/graham/src/rhapsody-wta/logs"
export TOIL_TORQUE_REQS="walltime=48:00:00,mem=192gb,nodes=2:ppn=16"

toil-cwl-runner \
  --batchSystem torque \
  --user-space-docker-cmd=udocker \
  --jobStore file:results/rhapsody-wta-job-store \
  --outdir results \
  --writeLogs logs \
  --logFile cwltoil.log \
  --logLevel DEBUG \
  --retryCount 2 \
  --maxLogFileSize 20000000000 \
  --stats \
  rhapsody-wta-yaml.cwl template_wta.yml

I saw this error:

Additional Torque resource requirements appended to qsub from TOIL_TORQUE_REQS env. variable: walltime=48:00:00,mem=192gb,nodes=2:ppn=16
GridEngine like batch system failure
Traceback (most recent call last):
  File "/public/home/weiwanqian/miniconda3/envs/bioinfo/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 226, in run
    while self._runStep():
  File "/public/home/weiwanqian/miniconda3/envs/bioinfo/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 215, in _runStep
    activity |= self.createJobs(newJob)
  File "/public/home/weiwanqian/miniconda3/envs/bioinfo/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 121, in createJobs
    subLine = self.prepareSubmission(cpu, memory, jobID, command, jobName)
  File "/public/home/weiwanqian/miniconda3/envs/bioinfo/lib/python3.6/site-packages/toil/batchSystems/torque.py", line 121, in prepareSubmission
    return self.prepareQsub(cpu, memory, jobID) + [self.generateTorqueWrapper(command, jobID)]
  File "/public/home/weiwanqian/miniconda3/envs/bioinfo/lib/python3.6/site-packages/toil/batchSystems/torque.py", line 179, in prepareQsub
    raise ValueError("Incompatible resource arguments ('mem=', 'nodes=', 'ppn='): {}".format(reqlineEnv))
ValueError: Incompatible resource arguments ('mem=', 'nodes=', 'ppn='): walltime=48:00:00,mem=192gb,nodes=2:ppn=16

Can’t I specify any computer hardware resources in the TOIL_TORQUE_REQS variable?

┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-680

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:26 (26 by maintainers)

github_iconTop GitHub Comments

1reaction
altairweicommented, Oct 6, 2020

if the ResourceRequirement of a CWL step exceeds the hardware resources of a single compute node, will the workflow fail immediately?

That step would not get scheduled by your batch system (torque, in your case) on a node that was too small . If no node was available then the workflow would eventually fail, I guess?

Torque should send the big workflow step to the “fat node” automatically as long as they are in the same queue.

Thank you for your patience in explaining this! @mr-c

0reactions
altairweicommented, Oct 13, 2020

If I change the reference format of PBS_JOBID:

            stdoutfile = self.boss.formatStdOutErrPath(jobID, 'torque', r'${PBS_JOBID}', 'std_output')
            stderrfile = self.boss.formatStdOutErrPath(jobID, 'torque', r'${PBS_JOBID}', 'std_error')

The error code has changed.

Job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#InternalSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_InternalSettings.cwl/instance-k9appcyh with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_InternalSettings.cwl/instance-k9appcyh is completely failed
Job failed with exit value 120: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#PutativeCellSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis
No log file is present, despite job failing: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#PutativeCellSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis
Due to failure we are reducing the remaining retry count of job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#PutativeCellSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis to 0
Job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#PutativeCellSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis is completely failed
Job failed with exit value 120: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#MultiplexingSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw
No log file is present, despite job failing: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#MultiplexingSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw
Due to failure we are reducing the remaining retry count of job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#MultiplexingSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw to 0
Job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#MultiplexingSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw is completely failed
Job failed with exit value 120: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#SubsampleSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq
No log file is present, despite job failing: 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#SubsampleSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq
Due to failure we are reducing the remaining retry count of job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#SubsampleSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq to 0
Job 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#SubsampleSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq with ID kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq is completely failed
Finished toil run with 5 failed jobs.
Failed jobs at end of the run: 'CWLWorkflow' kind-CWLWorkflow/instance-7_r7n9ze 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#InternalSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_InternalSettings.cwl/instance-k9appcyh 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#SubsampleSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_SubsampleSettings.cwl/instance-eq9vdegq 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#PutativeCellSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_PutativeCellSettings.cwl/instance-jinp7mis 'file:///public/home/graham/src/rhapsody-wta/workflow/rhapsody_wta_1.8.packed.cwl#MultiplexingSettings.cwl' kind-file_public_home_graham_src_rhapsody-wta_workflow_rhapsody_wta_1.8.packed.cwl_MultiplexingSettings.cwl/instance-lvfnv1bw

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction — Toil 5.8.0a1 documentation
Batch System API: Specifies either a local single-machine or a currently supported HPC environment (lsf, parasol, mesos, slurm, torque, htcondor, ...
Read more >
Running computations on the Torque cluster - DCCN HPC wiki
The Torque system comes with a set of command-line tools for users to manage jobs in ... a batch job to run the...
Read more >
Use TORQUE to submit and manage jobs on high ... - IU KB
To run a job in batch mode on a high performance computing system using TORQUE, first prepare a job script that specifies the...
Read more >
Torque - HPC Wiki
Job Submission. This command submits the job you defined in your jobscript to the batch system: $ qsub jobscript.sh. Just like any other ......
Read more >
Ubuntu Manpage: toil - Toil Documentation
batchSystemInterface: Specifies either a local single-machine or a currently supported HPC environment (lsf, parasol, mesos, slurm, torque, htcondor, or ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found