Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: WDL templates

See original GitHub issue

This came from discussions at OpenBio Winter Codefest around how to allow people to manage external resources, specifically driven by Spark.

Ideally, you want to write code once that does certain steps, e.g. manages Spark clusters (spin up, submit job(s), teardown) and reuse that generic code. This is supported by subworkflows by allowing you to reuse workflows. However, we also need the ability to inject some custom tasks into the generic workflow template, to customize it. Since the concept of the template would be WDL-specific, it would work for any externally managed resources. Cromwell does not need to be aware of Spark or any other similar resource, they all become user definable.

Additionally, you can think of other uses for templates that are not for managed resources: e.g. a template for variant calling that does: validate bam, run some form of variant calling, and then produce metrics. Someone could import the template and inject their custom variant calling workflow or task.

New keywords below are template, contract, inject:

template allows you to define a workflow where one or more contract-satisfying tasks or workflows can be injected. contract specifies the contract (a relatively generic bioinformatics analysis action, such as “alignment”) that defines inputs and outputs, but no other contents. inject tells the template which implementation workflow or task to inject, in order to satisfy the contract.

template.wdl:

template workflow variant_calling_template {
    String in1

    contract user_defined_variant_caller {
        String input1
        output {
            String output1
        }
    }

    # this template defines the workflow that is being run, in this case call 
    # task1, then task2, then the inner workflow, and finally task3
    call task1{input: in1=in1}
    call task2{input: in2=task1.out1}
    call user_defined_variant_caller{input: input1=task2.out2}
    call task3{input: in3=user_defined_variant_caller.output1}

    output {
        String out1 = task3.out1
    }
}

Making use of a template in another workflow:

my_variant_caller.wdl:

import "template.wdl" as variant_calling_template_wdl

task my_variant_caller {
    String input1
    String input2
    command { ... }
    output { ... }
}

workflow my_variant_caller {
    String in1
    String in2

    call variant_calling_template_wdl.variant_calling_template {
      input: in1 = in1,
      # NOTE: in trying to mock this out for a real example, I realized that you will 
      #             often want to pass extra params to your task that the contract does
      #             not specify.  
      #             I think the inject could work two ways:
      #             * only inject what the contact specifies: 
      #                    inject: user_defined_variant_caller = my_variant_caller
      #             * pass extra arguments beyond the contract as shown below (the contract 
      #                arguments also fulfilled): 
      #                    inject: user_defined_variant_caller = my_variant_caller(input: input2=in2)
      inject: user_defined_variant_caller = my_variant_caller(input: input2=in2)
    }

    output {
        String out1 = variant_calling_template.out1
    }
}

An example I tried out using this and https://github.com/openwdl/wdl/issues/183 for porting the hail wdl task I’ve been working on (https://github.com/broadinstitute/firecloud-tools/blob/ab_hail_wdl/scripts/hail_wdl_test/hail_test_cleanup.wdl) to this proposed syntax

############################
Template file contents:

task start_cluster {
  # TODO: a struct later would be much easier
  Map[String, String] dataproc_cluster_specs

  command { ... spin up cluster here ... }

  output {
    cluster_name = "name of cluster made"
    cluster_staging_bucket = "path to cluster's staging bucket"
  }
}

task delete_cluster {
  String cluster_name

  command { ... delete cluster here ... }

  output {}
}

# template workflow for running pyspark on dataproc
template workflow pyspark_dataproc {
   # TODO: a struct later would be much easier
   Map[String, String] dataproc_cluster_specs

   String cluster_name
   
   resources dataproc_cluster_manager {
        before start_cluster{input: dataproc_cluster_specs=dataproc_cluster_specs}
        call submit_job{cluster_name=cluster_name, cluster_staging_bucket=start_cluster.cluster_staging_bucket}
        after delete_cluster{input: cluster_name=cluster_name}
   }
}

############################
# User workflow file contents:

import "pyspark_dataproc.wdl" as pyspark_dataproc_wdl

# user defined task
task submit_job {
  String cluster_name
  String cluster_staging_bucket
  File   hailCommandFile
  String inputVds
  String inputAnnot
   
  File outputVdsFileName
  File qcResultsFileName  

  command { ... submit to the cluster and output to cluster staging bucket ... }
}

# workflow that uses template
workflow submit_hail_job {
    String cluster_name
    String cluster_staging_bucket
    File   hailCommandFile
    String inputVds
    String inputAnnot
   
    File outputVdsFileName
    File qcResultsFileName  

    call pyspark_dataproc_wdl.pyspark_dataproc {
      input: dataproc_cluster_specs = {"master_machine_type":"n1-standard-8", "master_machine_disk": 100},
      inject: submit_job = submit_job(input: cluster_name=cluster_name, cluster_staging_bucket=cluster_staging_bucket, 
                                             hailCommandFile=hailCommandFile, inputVds=inputVds, inputAnnot=inputAnnot,
                                             outputVdsFileName=outputVdsFileName, qcResultsFileName=qcResultsFileName)
    }

    output {
        String out1 = variant_calling_template.out1
    }
}

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

abaumanncommented, Jan 25, 2018

I totally agree about the other topics being more important, this was more to get discussion started about it.

I don’t quite understand the reproducibility or injection at runtime points though, so maybe I’m missing something there. The way I was picturing this is not very different from imports now. Imports already effect readability in that you need to look at all files to see what’s actually happening. In the case I outlined the import is effectively flipped and the template ends up importing the injected workflow or task. In the end you could flatten both into a single file and get the same results. Templates as I defined them are totally not needed, but it means you just need to add more explicit boilerplate to your workflows. You can always do a lot of copy paste and get the same results.

Perhaps there is a different way to express this that looks less polymorphic, because that was not the original intention. Right now if you import a task and use it, you satisfy the “contract” of that task or workflow by explicitly matching all of it’s inputs and outputs, and the contract as I specified was intended for the same purpose by showing what the task or workflow would need to look like in order to be effectively imported into the template.

1reaction

abaumanncommented, Jan 19, 2018

@cjllanwarne updated with the expanded argument syntax

Top Results From Across the Web

Varstation/wdl-docs-theme: Our theme proposal for WDL ... - GitHub

Our theme proposal for WDL pipelines docs. Contribute to Varstation/wdl-docs-theme development by creating an account on GitHub.

Sparrow - Creative Agency Proposal Template - Pinterest

Nov 24, 2016 - Buy Sparrow - Creative Agency Proposal Template by ... 11 Free Online Books for Web Designers: WDL Design Web,...

Free Web Design Proposal Template - PandaDoc

Convert more prospects into customers with this professionally designed and customizable web design proposal template.

Creating, testing & scaling WDL workflows

Below we have compiled community and BioData Catalyst resources to help users get started learning WDL to create their own workflows. Helpful definitions...

How to Write a Digital Marketing RFP [+ Free Template]

A digital marketing request for proposal is a formal business document that companies draft to procure services of digital marketing agencies.