question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

digest code for inputs & tasks to inform call caching

See original GitHub issue

precursor to #308 call caching

Call caching will work by recording (probably in a SQLite db), for each successful completed task call, a digest code of the task source code + inputs, and the output JSON. Then when we’re newly asked to run a task on given inputs, compute the digest code and query the database to see if we have a previous run with the same one (and all the output files still exist).

Digesting the inputs should be pretty easy: convert the inputs WDL.Env.Values to a dict with WDL.values_to_json, write them out to a JSON string with lexicographically ordered keys, and run a generic digest algorithm on that string.

Digesting tasks will be interesting. Ideally, we’d like the digest code to ignore trivial changes to the source code like whitespace, comments, and the order of declarations; while of course detecting any other meaningful changes to the task. That stated we can begin with something simpler, like digesting the substring of the .wdl file constituting the task source (the range of line & column numbers can be found from the pos attribute of the Task object).

Tasks are self-contained except for for the definition of any WDL struct types used therein. So ultimately the task digest would need to cover those struct type definitions as well as the task source code.

Later we’ll also want to be able to similarly digest entire workflows, which would need to cover the workflow source code as well as all called tasks (or subworkflows) and any struct types used.

Is there a way to achieve all this without needing to write a specialized digest method for every single AST node class? TBD.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
mlincommented, May 9, 2020

Here’s the WDL spec for structs btw to help orient: https://github.com/openwdl/wdl/blob/master/versions/1.0/SPEC.md#struct-definition

1reaction
mlincommented, Feb 10, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

CallCaching - Cromwell - Read the Docs
Call Caching allows Cromwell to detect when a job has been run in the past so that it doesn't have to re-compute results,...
Read more >
Call caching - Google Sites
Call caching identifies Docker images by their digest (aka hash), NOT their tag ... input file paths causes the system to deny call...
Read more >
Call caching with containers #5346 - broadinstitute/cromwell
I'm trying to get call-caching working for my workflows, ... cached-copy]; Local SFS environment; My input files can be fairly large (~250GB ...
Read more >
java - Caching wrapper for digest computation
1.I believe that this computeIfAbsent will execute doDigest only once. 2. Is there way to fix this code without change map implementation?
Read more >
Call caching: How it works and when to use it - Terra Support
Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found