question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

File Bundles and Secondary / Accessory Files

See original GitHub issue

Dear WDL team!

I was hoping to source the community opinion on file bundles, or CWL’s secondary files. This issue was spawned from: https://github.com/broadinstitute/cromwell/issues/2269 and also within the WDL forum where there was general support for accessory files.

My main input is that I’d like the associated files to be attached within a explicit type, and not annotated on a specific task. Even though it might be easy to determine at runtime whether the list of index files is different, I think it would be clearer to have a bundle type (that might sit alongside the same importing rules as structs).

CWL annotates each input with a list of secondaryFiles that uses a simple query syntax:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.

(They also allow for an expression that should resolve to a filename or an array of files)

My suggestions

To take with a grain salt, I’ve got a few suggestions on how this might be expression in WLD to get the ball rolling. I’ll use a modified indexed BAM, that has three files ($base.bam, $base.bam.bai, $base.txt) to show how the examples hold up:

  1. Create a bundle that has an implicit base file type (like CWL), and references secondary files using this base file with a secondary files selector syntax, probably the same as CWL for consistency and as far as I know it does the job. If coerced into a File, it should just resolve to the base file (in the following case, a Bam).
bundle NamedModifiedIndexBam {
  bai = ".bai"
  txt = "^.txt"
}

OR (just two suggestions for the same thing, not that WDL should accept both)

bundle AnonymousModifiedIndexBam = [".bai", "^.txt"]
  1. Create a more explicit bundle that has a basename, and must explicitly reference each associated file. It might be a good idea to have a base property, that would be the resolver when passing to the command line, or potential coersion into a file. Benefits are it doesn’t require the query language and makes it clearer.
bundle ExplicitModifiedIndexBam {
  bam = ".bam"	# or base = ".bam"
  bai = ".bam.bai"
  txt = ".txt"
}

Then you could reference them in the same way you do primitives or structs:

task my_task {
  ModifiedIndexBam bamFile
  NamedModifiedIndexBam namedBamFile
  ExplicitModifiedIndexBam explicitBamFile

  command {
    echo ${bamFile}             # :: $base.bam
    echo ${namedBamFile.base}   # :: $base.bam			
    echo $(explicitBamFile.bai) # :: $base.bam.bai
  }

  # example of outputs 
  output {
    ModifiedIndexBam bamOut = glob("output.bam")
    NamedModifiedIndexBam namedOut = glob("output.bam")
    ExplicitModifiedIndexBam explicitOut = glob("output")
  }
}

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

7reactions
DavyCatscommented, Feb 18, 2019

I can definitely see some cases where something like this would be useful, indexes being one obvious example. I’m wondering whether it would not be an option to simply allow defining defaults in a struct, rather then introducing an entire new type? I’m geussing there was some reason this isn’t allowed though, considering it is expressly not permitted.

Struct indexedBamFile {
    File bam
    File index = bam + ".bai"
    File txt = sub(bam, "\\.bam$", ".txt")
}

Or just to throw out another idea, adding an Implicit (or Inferred) block:

Struct indexedBamFile {
    File bam
    Implicit {
        File index = bam + ".bai"
        File txt = sub(bam, "\\.bam$", ".txt")
    }
}

Either way, I feel that the second option you provide here would be nicer. Giving ^ some special meaning seems like it might get rather confusing, expescially considering it already has a special meaning in regex.

3reactions
geoffjentrycommented, Feb 18, 2019

My first thought was to also latch on to the Struct concept somehow, although at the moment it doesn’t map cleanly as @DavyCats points out.

We could potentially describe a mechanism for (de)localizing entire Struct objects (I don’t think the spec currently allows for that) and then as @DavyCats describes, some scheme for pattern matching.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Appendix A: Tables of File Formats - National Archives |
Calendar events can include attachments in multiple file formats, such as documents or spreadsheets or other file formats not specified within this appendix....
Read more >
File Management for Music Production - Sweetwater
In this guide, we'll show you how to effectively manage files on your music production computer. Click here to learn more!
Read more >
Copy a Final Cut Pro project to another Mac - Apple Support
Final Cut Pro libraries combine your editing projects, video clips, and associated metadata into a single package that's easy to transfer to another...
Read more >
Android storage use cases and best practices
Open a document file · Write to files on secondary storage volumes · Migrate existing files from a legacy storage location · Share...
Read more >
Encountering Error Messages | GRANTS.GOV
You received one of these error messages because your application package or ... File attachment names longer than approximately 50 characters can cause ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found