Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In-place updates of files in CWL don't seem to work when run manually without caching

See original GitHub issue

On Courtyard, with commit 5b70540708973d0116d73605ee7daa3b171481cf (current master branch), when I run this one particular test from the CWL 1.1 conformance tests, I get the wrong answer when caching is off:

git clone https://github.com/common-workflow-language/cwl-v1.1.git ~/build/toil/src/toil/test/cwl/spec_v11 && cd ~/build/toil/src/toil/test/cwl/spec_v11 && git checkout 664835e83eb5e57eee18a04ce7b05fb9d70d77b7

toil-cwl-runner --disableCaching=True --clean=always --outdir=/tmp/tmp7fb6na8h --workDir=/tmp/adamnovak-toil/test --quiet tests/inp_update_wf.cwl tests/empty.json

(I need to put the --workDir on local storage to work around https://github.com/common-workflow-language/cwltool/issues/1405.)

It thinks and logs for a minute or so and finally outputs:

{
    "a": 3,
    "b": 4
}

If I change to --disableCaching False, I get the answer that the conformance tests actually check for:

{
    "a": 4,
    "b": 4
}

The a answer is generated by a job that looks at the output file of a prior job, and is supposed to be waiting on another successor of that prior job to have made an in-place update. With caching off, either that update is not happening or it is not visible (in time?) to the job that wants to see it.

I’m not sure why this doesn’t come up in the CI tests before; it has started failing on CI in the branch for #3323. My current theory is that when I changed the number of tests to run in parallel (which we had been misoverestimating due to seeing all the Kubernetes host cores but only having access to some of them), it changed the usual outcome of a race condition.

@DailyDreaming Any idea what’s going on here? Can you replicate this issue when you run this test?

┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-807

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

adamnovakcommented, Feb 18, 2021

I feel like we’re missing some code to detect and reupload files that CWL jobs tried to modify in place, and to get the new IDs for those files to other jobs that expect to see those modifications just by virtue of their execution order constraints.

If we don’t have that, how do we expect modified files to make it from one Mesos or Kubernetes node to another? We can’t commit the modification back to the original file ID.

0reactions

adamnovakcommented, Sep 8, 2021

We’ve completely rewritten CWL filesystem support since this happened, and we now have --bypass-file-store to enable the in-place update requirement that CWL offers.

Top Results From Across the Web

Docker execution will hang waiting for the CID file id ... - GitHub

In-place updates of files in CWL don't seem to work when run manually without caching DataBiosphere/toil#3464.

Toil Documentation

Running CWL workflows with InplaceUpdateRequirement . ... If it exists, this workflow will use the existing file and the results.

cwltool - Refefence executor for Common Workflow Language

--no-container Do not execute jobs in a Docker container, even when `DockerRequirement` is specified under `hints`. --tool-help Print command line help for tool ......

ERROR(EAD-EXPORT): PHP Warning: count() - Google Groups

https://www.accesstomemory.org/docs/2.7/user-manual/import-export/csv-validation/#error-file-encoding-does-not-appear-to-be-utf-8-compatible.

Design considerations for workflow management systems use ...

This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL ...