question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The need for an "Unlocalized File" type

See original GitHub issue

wdl writers currently have two options when dealing with files:

  1. Use a File type and have the file “localized” and sanitized (WRT to the original name of the file).
  2. Use a String type, and access the raw URL directly (if you can), and deal with resulting issues (see below).
<separator between lists>
  1. Is good for small files and “dumb” programs. It also makes call-caching possible as the MD5 of the file can be stored in a DB
  2. Is a good idea when accessing many small parts of large files (say combining 80K vcfs). It requires smarter programs (that know how to interact with an index and to talk to the requisite filesystem, be it s3:// gs:// or http://. The problem with this option is that call-caching is effectively disabled here, as the variable will have a different raw url with every invocation of the task.

Since more and more of our pipelines are trying to use String instead of File as a cost-saving measure, I think it makes sense to see how we could enable call-caching without localizing a file. I have some ideas:

  1. Have some decoration on a File type indicate that it should be not localized and that the raw string should be used instead. This should enable call-caching without localization, as the implementations should be able to md5 the underlying file)
  2. In the run-time attributes (or elsewhere), indicate which variables shouldn’t be localized (ugly)
  3. Add a new type, say RemoteFile which acts like the solution in 1 but without a decoration.
  4. Allow a decoration (mutable?) to indicate that changes in this variable should not invalidate a cache, then one could pull in only the MD5 of the file (which would be cached) as an additional variable from a separate task and and the String pointer to the file as mutable, thus guarantee that if the file changes, the task is run and the cache is pointing to the right place (possibly easiest to implement, but hardest to maintain as a wdl author)

Other ideas or thoughts on the matter?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
orodehcommented, Jan 12, 2018

@mlin and myself did suggest adding a “stream” annotation inside the parameter_meta section. This is implemented and described in the dxWDL README. I don’t have a preference between an annotation and a modifier.

While porting the GATK4 pipeline, I realized it was using remote-files, as @yfarjoun describes. The implementation calls Google NIO from the GATK4 jar file. It would be much nicer to have a proper declaration of these files, currently, their type is String.

0reactions
patmageecommented, Nov 20, 2019

this is implemented in various different ways in the engines (ie cromwell localization_optional).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Prokaryotic and Eukaryotic Genomes Submission Guide - NCBI
Submit a batch of genomes · Be part of the same BioProject · Be either WGS or non-wgs, not a mix of both...
Read more >
Identify unlocalized text - Stack Overflow
In ASP.NET, an application can be localized using resource files. Resource files hold different translations. For example, one might have an ...
Read more >
Some filetypes in Save/Save As dialog appear unlocalized
When selecting to save a document in Writer, there are some filetypes in the Type drop-down that are in English (even if the...
Read more >
Unlocalized Placement Text - Total War Center Forums
looking at them in notepad++, i see that all are UTF-8 encoding format but one. the "export_units" file (from the data\text) folder is...
Read more >
[ALL_LANG] Unlocalized warning messages on volume->geo ...
Description of problem: Need to translate following warning messages on volume->geo replication->add pane. 'Destination and master volumes should not be ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found