question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

assumptions around modifying input files, or making new files in their directories

See original GitHub issue

It is suggested the the OpenWDL specification clarify the write-status of input files, and the directories they are in.

I suggest that in a future version of the OpenWDL spec, it is declared that all inputs files must be make read-only to have consistent behavior.

This also helps with converting to CWL, as it has the same restriction, unless InitialWorkDirRequirement is used to mark some inputs as writable: true

For miniwdl, they have an IO-expensive workaround https://miniwdl.readthedocs.io/en/latest/runner_advanced.html#read-only-input-files

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
rhpvordermancommented, Mar 2, 2022

MUST is the only option in my opinion. Mostly because WDL strives to be readable. Having to keep in mind that a task manipulates the inputs adds very significant cognitive overhead. Even seemingly simple workflows might be not as simple as they seem, so each individual task needs to be inspected to ensure that indeed, only the outputs are affected by the tasks, not the inputs.

I think that is a very undesirable state for WDL to be in. Reading a workflow from the top-level should be enough to infer what is happening.

1reaction
rhpvordermancommented, Feb 22, 2022

Regarding “making new files in their directories” – IIRC (correction welcomed) Cromwell drops input files into the task working directory, so I think practically this has to be allowed.

Cromwell creates a separate inputs directory that is in the parent of the execution (working) directory. All files are solved to absolute paths in cromwell.

As for the samtools index case: BioWDL makes a hardlink this works in Cromwell because inputs and execution are always on the same filesystem. It is an extremely ugly hack, and precisely for that reason we don’t use the samtools index task. Usually when handling BAM files, the utility that does so has an index command. So we just perform the indexing directly in the command that produces a new BAM file. That solves a lot of hassle. It is really only a problem for tools that have no flag for specifying the index if it is not in the same dir. Maybe we should fix this issue there?

Anyway, sorry for the digression. I am in favor of enforcing read-only inputs. In terms of reproducibility, if I run task A, and then run task A again on the same input, that should have the same result. That can not be guaranteed if task A changes the input.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Workspace and Files
In this lesson, you'll learn how to examine your local workspace in R and ... List all the files in your working directory...
Read more >
Prepare for LPIC-1 exam 1 - topic 103.3: File and directory ...
Overview. This tutorial grounds you in the basic Linux commands for manipulating files and directories. Learn to: List directory contents; Copy, move, ...
Read more >
Naming Files, Paths, and Namespaces - Win32 apps
Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and...
Read more >
File system - Wikipedia
The file system manages access to both the content of files and the metadata about those files. It is responsible for arranging storage...
Read more >
Files, Streams, and External Operations
When you create files for input to the Wolfram Language, you usually want them to contain only "plain text", which can be read...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found