'File is missing' error accessing a file in a Directory, Toil expects a file:// IRI prefix which failed to be added
See original GitHub issueWe are experiencing failures using Toil to run our CWL workflow, and suspect a bug in Toil’s handling of Directory inputs. The symptom is an error message, “File is missing: /the/path/to/some/file” (note that this is a file path, without an IRI schema prefix).
On code inspection of uploadFile()
in src/toil/cwl/cwltoil.py
, it is clear the logic expects an IRI with file://
schema prefix within the contents of uf["location"]
. trimming the first 7 characters in the check if not os.path.isfile(uf["location"][7:]):
. That prefix should have been added when resolving the IRI relative to the CWL on the filesystem, and clearly the error message shows it was not. Since the first 7 characters of the file path get stripped regardless of the schema prefix, the path fails to represent a valid file in the filesystem.
The code paths for resolving IRIs for File vs Directory are different, and I suspect a bug in the latter. In particular, File IRIs have logic to resolve schema-relative locations, via a call to schema_salad.ref_resolver.file_uri()
; suspiciously, such call isn’t present in the code path for resolving Directory objects.
To reproduce this failure, you may try to run the NCBI PGAP workflow. Please note that this pipeline is still a work in progress, and has not yet been formally announced or released; the URL is subject to move: https://github.com/ncbi-gpipe/pgap
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-276
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top GitHub Comments
I’ll try to come up with a minimalist example that reproduces the failure. If there’s any specific debugging/logging you’d like me to add, feel free to ask. Here’s the Python stack trace at the point of failure; if you want full logs, I can attach those too.
This bug is no longer present in toil-cwl-runner 5.5.0 (but probably sooner)