Job never stop with javascript input
See original GitHub issueHello there, we are facing an issue with a workflow that is using this type of CWL file:
the problem appear only when i’m using “InlineJavascriptRequirement” with this kind of javascript expressions :
my_step:
run: ./my_sub_cwl.cwl
in:
# Javascript sub selection
array_input:
source: array_input
valueFrom: |
${
for (var i = 0 ; i < self.length; i++) {
if ( self[i].basename.includes("<here I'm putting any string in order to filter>")) {
return self[i]
}
}
return self[0]
}
out: [ out1, out2]
I also tried another notation, but the exact same issue appears:
valueFrom: $(inputs.array_input.filter(f => f.basename.includes("<here I'm putting any string in order to filter>"))[0])
The actual task is finishing correctly (with a return_code = 0 ), the input File generated by the Javascript expression is properly set and the file is retrieved.
BUT
The job never “finished” properly and loop over this message infinitely :
[2021-12-16T14:48:28+0000] [Thread-2 ] [W] [toil.batchSystems.singleMachine] Sent redundant job completion kill to surviving process group 615 known to batch system 140180515079744
Important note :
- We were not facing this issue with toil v5.3.0 but only after updating to v5.5.0 (we didn’t tried with 5.4.0)
- I’m not facing the issue if I use the following expression :
valueFrom: $(self[0])
I did not found any similar issue for now, and don’t know what to try in order to fix it.
Any help would be really helpful.
Thanks a lot in advance 😃 , Etienne
┆Issue is synchronized with this Jira Task ┆friendlyId: TOIL-1110
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (14 by maintainers)
Top GitHub Comments
I did some testing and I don’t think the signal handler in
Toil.__enter__
would override the handlers in the_toil_worker
process. (I also started https://github.com/DataBiosphere/toil/compare/issues/3965-user-provided-exit-handler but I don’t think it’ll help much here)However, the problem is that a sigterm signal on the leader process might not propagate to its workers, so I don’t know the best way to handle interrupts like this on the worker. I can look more into this after winter break.
We use
processes_to_kill
https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/sandboxjs.py#L18
https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/utils.py#L58
Where each nodeja subprocess is appended to this, both non-containerized
https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/sandboxjs.py#L76
Or containerized
https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/sandboxjs.py#L159
So
toil-cwl-runner
and_toil_worker
both need to call https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/main.py#L103 from their shutdown routines. And also from an interrupt handler like https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/main.py#L137 Via https://github.com/common-workflow-language/cwltool/blob/041cc0eb8f0272846e5b3e685fe51367f3ef93a6/cwltool/main.py#L1460Ah-ha, good point. We need to fix that in
cwltool
, yes