question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Automatic Coercions Considered Harmful

See original GitHub issue

Automatic coercions considered harmful

Authors: O. Rodeh with J. Didion (DNAnexus Inc.)

This issue describes a line of thought regarding type checking for WDL programs. It is based on experience acquired while working with DNAnexus customers on their WDL pipelines. The author also wrote a WDL type checker, evaluator, and cloud executor.

Coercions

WDL makes wide ranging use of coercions to convert one type to another. This is, in general, a well accepted idea used in a wide variety of programming languages. At DNAnexus we have run into several instances where WDL workflows errored out because static type checking was too lenient. This resulted in long running expensive customer pipelines failing. The issue is twofold: the turn around time is long, and the cost can run to thousands of dollars.

For example, under the WDL type system, a File is auto-coercible into File?. However, this is not the case in the DNAnexus simpler type-system. If you have a task that takes an Array[File?] and call it with Array[File] this will fail. For example:

version 1.0

task foo {
  input {
    Array[File?] bams
  }
  command {}
}


workflow bar {
  input {
    Array[File] my_files
  }

  call long_expensive_calculation {}

  call foo { input : bams = my_files }

  output {}
}

Workflow bar will fail when it calls foo, after it has spent a long time running an expensive calculation. This is frustrating for the users. We could prevent this, as well as many other errors, by performing stricter type checking. Below is an eclectic list of automatic coercions that appears in GATK pipelines. The GATK pipelines are large real world workflows that our customers expect to execute flawlessly on our platform. The code is a simplified version of the real program lines, stripping a way the context to focus on the main point.

Converting between T and T? occurs automatically

  # convert an optional file-array to a file-array
  Array[File] files = ["a", "b"]
  Array[File]? files2 = files

  # Convert an optional int to an int
  Int a = 3
  Int? b = a

  # An array with a mixture of Int and Int?
  Int c = select_first([1, b])

Converting between strings, integer, and floats “just works”.

  # Convert an integer to a string
  String s = 3
  Int z = 1.3

  # converting Array[String] to Array[Int]
  Int d = "3"
  Array[Int] numbers = ["1", "2"]

The branches of a conditional may not resolve to the same type. Here, one side is Array[String], the other is Array[String?].

  # https://github.com/gatk-workflows/gatk4-germline-snps-indels/blob/master/JointGenotyping-terra.wdl#L556                        
  #                                                                                                                                
  Array[String] birds = ["finch", "hawk"]                                                                                          
  Array[String?] mammals = []                                                                                                      
  Array[String?] zoo = if (true) then birds else mammals                                                                           

Since T and T? are interchangeable, we can perform operations like T + T? + T.

  # https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels/blob/master/JointGenotypingWf.wdl#L570
  #
  # Adding a String and String?
  String? here = "here"
  String buf = "not " + "invented " + here

  # https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels/blob/master/PairedEndSingleSampleWf.wdl#L1207
  #
  # convert an optional int to string.
  Int? max_output = 3                                                                                                              
  String s1 = "MAX_OUTPUT=" + max_output

The following non-intuitive code type checks in WDL (womtool-51.jar) :

Array[String] strings = [1, 1.2, "boo"]                                                                                            
Array[Int] numbers = [1, 1.2, "3.0"]                                                                                               

And then aai type checks too:

Array[Array[Int]] aai = [[1, 2], ["foo"], [1.4, 4.9, 9.3]]                                                                         

Proposal

Explicit coercions are useful and make sense, for example:

File? x = y                                                                                                                        
Array[File] af = ["a.txt", "b.txt"]                                                                                                

However, auto-coercions can lead to unexpected behavior and costly/confusing job failures. We suggest allowing explicit coercions, adding a few necessary stdlib function for type conversion, and disallowing automatic coercions. A few necessary stdlib functions, written a la Scala, are:

T? Some(T)                                                                                                                         
Int toInt(String)                                                                                                                  
String toString(Int|Float)                                                                                                         
File toFile(String)                                                                                                                

In order to select the first non empty integer, as below:

  Int c = select_first([1, b])                                                                                                     

one would write:

Int c = select_first([Some(1), b])                                                                                                 

This would make the typing of select_first accurate; it would only take an array of T?.

Code like:

 # Convert an optional int to an int                                                                                               
  Int a = 3                                                                                                                        
  Int? b = a                                                                                                                       

would be written:

Int? b = Some(a)

A Pythonic way of writing these functions would be to have a single cast function that takes a type and a value and either casts the value to the type or throws an error. For example:

Int i = cast(Int, "hello")                                                                                                         
String s = cast(String, 1)                                                                                                         
File f = cast(File, "foo/bar.txt")                                                                                                 
File err = cast(File, 1)  # throws an error because Int to File casting is not allowed                                             

cast could even handle optional types, which would get rid of the need for Some(), e.g.

String s1 = "foo/bar.txt"                                                                                                          
String? s2 = None                                                                                                                  
File? f1 = cast(File?, s1)  # f1 == Some(File("foo/bar.txt"))                                                                      
File? f2 = cast(File?, s2) # f2 == None                                                                                            

I hope this approach would reduce the number of preventable runtime errors.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
orodehcommented, Jun 11, 2020

Forcing people to be explicit about their coercions does make WDL more verbose. However, it opens the door to using type inference (@cjllanwarne mentioned this a long time ago).

The WDL type system is actually quite complex. We have overloading, size can be called in several different ways. We also have parametric polymorphism, cross and zip work on any type of array. On top of that, we have auto-coercions, which interact badly with the former two features. If we drop them, I -think-, we could do Mindley-Milner type inference which would make WDL less verbose.

A realistic way of doing this is support type-inference inside a task or a workflow, supplemented with explicit coercions when needed.

0reactions
Gvaihircommented, Aug 15, 2022

How about type coercion of the inputs/outputs from the standard library functions? On the one hand - all inputs and outputs must be typed. But I do see some use cases where:

scatter (pair in zip(foo, bar)) {}

I could not find any such case in the spec, and I think this is a bit contradicting the static typing requirement. In general though seems these examples compile and run, but we start to see coercion problems when nesting such functions and applying them to optional types. Like this:

Array[Int] foo
Array[Int] bar
Array[String?] lol

scatter (nested_pair in zip(zip(foo, bar), lol)) {}

In this case we see the problems of coercion for values of lol which were evaluated to null, throwing exceptions at runtime. If we declare and type the outputs - all fine, no exceptions:

Array[Pair[Pair[Int, Int], String?]] arr = zip(zip(foo, bar), lol)
scatter (nested_pair in arr) {}

We can solve this in dxCompiler and allow such scenarios, but isn’t this in the realm of automatic type coercion and contradictory to strong/static typing of the language?

Could you provide more explicit guidance on typing the inputs/outputs for the standard library functions in the language spec?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are the dangers of type coercion in JavaScript? - Quora
For a beginner who isn't completely familiar with JavaScript's quirky behaviours, the automatic coercions may do things that are decidely “unintuitive” leading ...
Read more >
An Empirical Study of Implicit Type Conversions in JavaScript
Abstract. Most popular programming languages support situations where a value of one type is converted into a value of another type without any...
Read more >
8.1 — Implicit type conversion (coercion) - Learn C++
Implicit type conversion (also called automatic type conversion or coercion) is performed automatically by the compiler when one data type ...
Read more >
JavaScript type coercion explained - freeCodeCamp
Type coercion is the process of converting value from one type to another ... types automatically, and it is called implicit type coercion....
Read more >
Types
16.) Coercions can be dangerous because they are not explicit, but depend on declarations that may be some way away in the program...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found