toil-wdl-runner inconsistencies
See original GitHub issueHello,
I’m trying to use Toil as a back end engine to run this wdl pipeline, in light of the supported features mentioned in Toil documentation here.
This script has been tested with cromwell-37
, and runs as expected. However, I couldn’t get it to work with toil-wdl-runner
. From debugging, I could form a few conclusions but would like to hear from you nonetheless.
-
Documentation fix: With WDL v1.0 being out, and supported by cromwell; it is helpful for readers to know that toil does not actually support WDL v1.0 (so, for example, the
version
statement should not be there, inputs need to be directly listed before the command within a task and not eclosed by explicitinput {
and}
, … etc ) -
A note to self: For some of my inputs, it was appropriate to pass some parameters as
-o "'${BWAExtraOptionsString}'"
within the command block of the wdltask
definition. The toil runner was not happy and generated the rather non-informative error message below. It took a minute to relate to this part, but thought to mention it nonetheless.
File "/home/azza/github_repos/varCall/toilwdl_compiled.py", line 574
command49 = r''' -o "''''
^
SyntaxError: EOL while scanning string literal
-
Conditionals: Control flow via
if
statements is supported by toil -
Scatter: Scatter logic is supported by toil as rightfully mentioned in the toil documentation link
-
_Conditionals nested within scatter _: An
if
statement nested within a wdlscatter
block is not evaluated. For example, the parsing/compilation of my wdl script above (intoilwdl_compiled.py
) breaks due to such occurance. Below are code snippets of the error, context within the parsed code, and a strip down of the corresponding wdl code:
# The error message:
File "/home/azza/github_repos/varCall/toilwdl_compiled.py", line 2303
rvDict = {}
^
IndentationError: expected an indented block
# Relevant lines in toilwdl_compiled.py:
2301 for lane in NormalInputReads:
2302
2303 rvDict = {}
2304 return rvDict
# The nested scatter-if block in the wdl script:
scatter (lane in NormalInputReads) {
if(PairedEnd) {
call trimsequencesTask as TRIMSEQ_paired
}
if(!PairedEnd) {
call trimsequencesTask as TRIMSEQ_single
}
}
- Help needed: From the previous observation, I removed conditions from scatter blocks; yet I’m not able to go past the first scatter block. I also stripped down the script to the minimum ( GermlineMasterWorkflow_toilsimplified.wdl.txt; and the corresponding json germlinemasterworkflow_toil.json.txt ). Yet, I get this error message which I can not follow through. Would you be able to help with this? I’m not sure if there is a way to show more informative error log.
$ toil-wdl-runner GermlineMasterWorkflow_toilsimplified.wdl germlinemasterworkflow_toil.json
INFO:toil.wdl.wdl_functions:Importing /home/azza/github_repos/varCall/Inputs/HG00120.lowcoverage.chr20.smallregion_1.fastq.gz into the jobstore.
INFO:toil.wdl.wdl_functions:Importing /home/azza/github_repos/varCall/Inputs/HG00120.lowcoverage.chr20.smallregion_2.fastq.gz into the jobstore.
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (4).
WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (6141247488).
WARNING:toil.batchSystems.singleMachine:Limiting maxDisk to physically available disk (5690355712).
INFO:toil:Running Toil version 3.19.0-0feb1d4d1b4fc66062fc4dbc5d8f7fb046df39e6.
INFO:toil.leader:Issued job 'EncapsulatedJob' R/l/jobPAfDuE with job batch system ID: 0 and cores: 1, disk: 2.0 G, and memory: 2.0 G
DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/home/azza/github_repos/varCall/toil-tsts/toilWorkflowRun'.
INFO:toil.leader:Job ended successfully: 'EncapsulatedJob' R/l/jobPAfDuE
WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'alignmentTaskCls' R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil:Running Toil version 3.19.0-0feb1d4d1b4fc66062fc4dbc5d8f7fb046df39e6.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpFtgESg-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpt1yL4w-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpbmanrZ-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: initialize_jobs
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpk2QR4n-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpCS_0ka-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: scatter0
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmp4xc6aB-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: trimsequencesTask
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpAdpGKZ-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpQOpmvQ-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: scatter1
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/home/azza/github_repos/varCall/toil-tsts', name='toilwdl_compiled', fromVirtualEnv=False).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpb8LznN-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: alignmentTask
WARNING:toil.leader:R/l/jobPAfDuE Traceback (most recent call last):
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/worker.py", line 324, in workerScript
WARNING:toil.leader:R/l/jobPAfDuE job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/job.py", line 1351, in _runner
WARNING:toil.leader:R/l/jobPAfDuE returnValues = self._run(jobGraph, fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/job.py", line 1296, in _run
WARNING:toil.leader:R/l/jobPAfDuE return self.run(fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/github_repos/varCall/toil-tsts/toilwdl_compiled.py", line 668, in run
WARNING:toil.leader:R/l/jobPAfDuE OutputBams = process_outfile(output_filename, fileStore, tempDir, '/home/azza/github_repos/varCall/toil-tsts')
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/wdl/wdl_functions.py", line 277, in process_outfile
WARNING:toil.leader:R/l/jobPAfDuE return process_single_outfile(f, fileStore, workDir, outDir)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/wdl/wdl_functions.py", line 259, in process_single_outfile
WARNING:toil.leader:R/l/jobPAfDuE '{}\n'.format(f, tmp, exe))
WARNING:toil.leader:R/l/jobPAfDuE RuntimeError: OUTPUT FILE: HG00120.bam was not found!
WARNING:toil.leader:R/l/jobPAfDuE total 4.3M
WARNING:toil.leader:R/l/jobPAfDuE drwx------ 3 azza azza 4.0K Apr 10 20:38 .
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxrwx 3 azza azza 4.0K Apr 10 20:38 ..
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxr-x 2 azza azza 4.0K Apr 10 20:38 execution
WARNING:toil.leader:R/l/jobPAfDuE -rw-rw-r-- 1 azza azza 2.2M Apr 10 20:38 HG00120.lowcoverage.chr20.smallregion_1.fastq.gz
WARNING:toil.leader:R/l/jobPAfDuE -rw-rw-r-- 1 azza azza 2.2M Apr 10 20:38 HG00120.lowcoverage.chr20.smallregion_2.fastq.gz
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE total 8.0K
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxr-x 2 azza azza 4.0K Apr 10 20:38 .
WARNING:toil.leader:R/l/jobPAfDuE drwx------ 3 azza azza 4.0K Apr 10 20:38 ..
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE ERROR:toil.worker:Exiting the worker because of a failed job on host azza-Satellite-P845
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'alignmentTaskCls' R/l/jobPAfDuE with ID R/l/jobPAfDuE to 1
INFO:toil.leader:Issued job 'alignmentTaskCls' R/l/jobPAfDuE with job batch system ID: 1 and cores: 1, disk: 2.0 G, and memory: 2.0 G
DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/home/azza/github_repos/varCall/toil-tsts/toilWorkflowRun'.
INFO:toil.leader:Job ended successfully: 'alignmentTaskCls' R/l/jobPAfDuE
WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'alignmentTaskCls' R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil:Running Toil version 3.19.0-0feb1d4d1b4fc66062fc4dbc5d8f7fb046df39e6.
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.resource:'JTRES_148fe75160b89a3a979fcad3e835d3c1' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.fileStore:Starting job R/l/jobPAfDuE/g/tmpb8LznN-_serialiseJob-stream with less than 10% of disk space remaining.
WARNING:toil.leader:R/l/jobPAfDuE INFO:toil.fileStore:LOG-TO-MASTER: alignmentTask
WARNING:toil.leader:R/l/jobPAfDuE Traceback (most recent call last):
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/worker.py", line 324, in workerScript
WARNING:toil.leader:R/l/jobPAfDuE job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/job.py", line 1351, in _runner
WARNING:toil.leader:R/l/jobPAfDuE returnValues = self._run(jobGraph, fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/job.py", line 1296, in _run
WARNING:toil.leader:R/l/jobPAfDuE return self.run(fileStore)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/github_repos/varCall/toil-tsts/toilwdl_compiled.py", line 668, in run
WARNING:toil.leader:R/l/jobPAfDuE OutputBams = process_outfile(output_filename, fileStore, tempDir, '/home/azza/github_repos/varCall/toil-tsts')
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/wdl/wdl_functions.py", line 277, in process_outfile
WARNING:toil.leader:R/l/jobPAfDuE return process_single_outfile(f, fileStore, workDir, outDir)
WARNING:toil.leader:R/l/jobPAfDuE File "/home/azza/toilenv/lib/python2.7/site-packages/toil/wdl/wdl_functions.py", line 259, in process_single_outfile
WARNING:toil.leader:R/l/jobPAfDuE '{}\n'.format(f, tmp, exe))
WARNING:toil.leader:R/l/jobPAfDuE RuntimeError: OUTPUT FILE: HG00120.bam was not found!
WARNING:toil.leader:R/l/jobPAfDuE total 4.3M
WARNING:toil.leader:R/l/jobPAfDuE drwx------ 3 azza azza 4.0K Apr 10 20:38 .
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxrwx 3 azza azza 4.0K Apr 10 20:38 ..
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxr-x 2 azza azza 4.0K Apr 10 20:38 execution
WARNING:toil.leader:R/l/jobPAfDuE -rw-rw-r-- 1 azza azza 2.2M Apr 10 20:38 HG00120.lowcoverage.chr20.smallregion_1.fastq.gz
WARNING:toil.leader:R/l/jobPAfDuE -rw-rw-r-- 1 azza azza 2.2M Apr 10 20:38 HG00120.lowcoverage.chr20.smallregion_2.fastq.gz
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE total 8.0K
WARNING:toil.leader:R/l/jobPAfDuE drwxrwxr-x 2 azza azza 4.0K Apr 10 20:38 .
WARNING:toil.leader:R/l/jobPAfDuE drwx------ 3 azza azza 4.0K Apr 10 20:38 ..
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE
WARNING:toil.leader:R/l/jobPAfDuE ERROR:toil.worker:Exiting the worker because of a failed job on host azza-Satellite-P845
WARNING:toil.leader:R/l/jobPAfDuE WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'alignmentTaskCls' R/l/jobPAfDuE with ID R/l/jobPAfDuE to 0
WARNING:toil.leader:Job 'alignmentTaskCls' R/l/jobPAfDuE with ID R/l/jobPAfDuE is completely failed
INFO:toil.leader:Finished toil run with 1 failed jobs.
INFO:toil.leader:Failed jobs at end of the run: 'alignmentTaskCls' R/l/jobPAfDuE
INFO:toil.common:Successfully deleted the job store: FileJobStore(/home/azza/github_repos/varCall/toil-tsts/toilWorkflowRun)
Traceback (most recent call last):
File "/home/azza/github_repos/varCall/toil-tsts/toilwdl_compiled.py", line 2368, in <module>
fileStore.start(job0)
File "/home/azza/toilenv/lib/python2.7/site-packages/toil/common.py", line 771, in start
return self._runMainLoop(rootJobGraph)
File "/home/azza/toilenv/lib/python2.7/site-packages/toil/common.py", line 1044, in _runMainLoop
jobCache=self._jobCache).run()
File "/home/azza/toilenv/lib/python2.7/site-packages/toil/leader.py", line 245, in run
raise FailedJobsException(self.config.jobStore, self.toilState.totalFailedJobs, self.jobStore)
toil.leader.FailedJobsException
Traceback (most recent call last):
File "/home/azza/toilenv/bin/toil-wdl-runner", line 10, in <module>
sys.exit(main())
File "/home/azza/toilenv/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 146, in main
subprocess.check_call(cmd)
File "/home/azza/toilenv/lib/python2.7/site-packages/subprocess32.py", line 307, in check_call
raise CalledProcessError(retcode, cmd)
subprocess32.CalledProcessError: Command '['python', '/home/azza/github_repos/varCall/toil-tsts/toilwdl_compiled.py']' returned non-zero exit status 1.
Would you be able to comment on these issues, and maybe help with debugging the last error message?
Thank you Azza
====
- My environment:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
- Toil version 3.19.0-0feb1d4d1b4fc66062fc4dbc5d8f7fb046df39e6 (from the execution log)
┆Issue is synchronized with this Jira Story ┆Epic: Improved WDL support and more complex conformance tests Q5 ┆Issue Number: TOIL-37
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
@azzaea You’re right, we haven’t nailed down exactly what works. We have a few wdl scripts that run as a part of our testing, including all of the tutorial wdl scripts from their site and an ENCODE workflow I believe, which I admit is not very helpful.
If I get bandwidth in the coming weeks, there might be an attempt to shore this up and address your points, to see about saving the logs as well as providing better information in those logs. If so, the documentation will be updated.
Thanks for your feedback on this, it’s much appreciated!
Hello @DailyDreaming !
Just thought to report on a relevant pattern that doesn’t seem supported. While simple Scattering of tasks works, chaining tasks within a Scatter does not. Below is an example snippet that works fine (with toil-wdl-runner) outside a Scatter, or if a single task is called, but not via a chain: