The need for an "Unlocalized File" type
See original GitHub issuewdl writers currently have two options when dealing with files:
- Use a File type and have the file “localized” and sanitized (WRT to the original name of the file).
- Use a String type, and access the raw URL directly (if you can), and deal with resulting issues (see below).
- Is good for small files and “dumb” programs. It also makes call-caching possible as the MD5 of the file can be stored in a DB
- Is a good idea when accessing many small parts of large files (say combining 80K vcfs). It requires smarter programs (that know how to interact with an index and to talk to the requisite filesystem, be it s3:// gs:// or http://. The problem with this option is that call-caching is effectively disabled here, as the variable will have a different raw url with every invocation of the task.
Since more and more of our pipelines are trying to use String instead of File as a cost-saving measure, I think it makes sense to see how we could enable call-caching without localizing a file. I have some ideas:
- Have some decoration on a File type indicate that it should be not localized and that the raw string should be used instead. This should enable call-caching without localization, as the implementations should be able to md5 the underlying file)
- In the run-time attributes (or elsewhere), indicate which variables shouldn’t be localized (ugly)
- Add a new type, say
RemoteFile
which acts like the solution in 1 but without a decoration. - Allow a decoration (mutable?) to indicate that changes in this variable should not invalidate a cache, then one could pull in only the MD5 of the file (which would be cached) as an additional variable from a separate task and and the String pointer to the file as
mutable
, thus guarantee that if the file changes, the task is run and the cache is pointing to the right place (possibly easiest to implement, but hardest to maintain as a wdl author)
Other ideas or thoughts on the matter?
Issue Analytics
- State:
- Created 6 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
Prokaryotic and Eukaryotic Genomes Submission Guide - NCBI
Submit a batch of genomes · Be part of the same BioProject · Be either WGS or non-wgs, not a mix of both...
Read more >Identify unlocalized text - Stack Overflow
In ASP.NET, an application can be localized using resource files. Resource files hold different translations. For example, one might have an ...
Read more >Some filetypes in Save/Save As dialog appear unlocalized
When selecting to save a document in Writer, there are some filetypes in the Type drop-down that are in English (even if the...
Read more >Unlocalized Placement Text - Total War Center Forums
looking at them in notepad++, i see that all are UTF-8 encoding format but one. the "export_units" file (from the data\text) folder is...
Read more >[ALL_LANG] Unlocalized warning messages on volume->geo ...
Description of problem: Need to translate following warning messages on volume->geo replication->add pane. 'Destination and master volumes should not be ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@mlin and myself did suggest adding a “stream” annotation inside the
parameter_meta
section. This is implemented and described in the dxWDL README. I don’t have a preference between an annotation and a modifier.While porting the GATK4 pipeline, I realized it was using remote-files, as @yfarjoun describes. The implementation calls Google NIO from the GATK4 jar file. It would be much nicer to have a proper declaration of these files, currently, their type is
String
.this is implemented in various different ways in the engines (ie cromwell
localization_optional
).