Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rethink Structural Types

See original GitHub issue

Originally, structural types were introduced to make the language fit better the underlying foundations (type theorists prefer structural), and to emulate the idea of “duck typing” in dynamic languages but with static guarantees. But it turned out that almost nobody uses them. It seems that a combination of traits and classes, together with type classes represented by implicit parameters gives enough flexibility, so the need for duck typing is rarely felt.

However, there is another area where statically-typed languages are often more awkward than dynamically-typed ones: database access. In a dynamically typed language, it’s quite natural to model a row as a record or object, and to select entries with simple dot notation, e.g. row.columnName. In a statically typed language. we can do that only if we somehow define a class for every possible row arising from a data-base manipulation (including rows arising from joins and projections), and set up a scheme to map between a row and the class representing it. This requires a lot of boilerplate code. So quite often one opts for a simpler scheme where column names are represented as strings that are passed to a select operator, e.g. row.select("columnName"). But this forgoes all the advantages of static typing and additionally is more awkward to write than the dynamically typed version.

A case in point is the Spark framework. The first version of Spark essentially supported distributed collections using RDDs. Used from Scala, this was very natural, an RDD was just some kind of collection, and was accessed in the same way as other Scala collections. Collection elements were defined by classes which were mapped transparently to database rows.

Later versions of Spark added database schemas (“data frames”) for better optimizations and multi-language support. But, sadly, this meant that some amount of type safety was lost and member access was now done via strings instead of the more natural dot notation.

It seems the most natural type to represent a row in a database scheme is a structural type, with one field for each column. But unfortunately this does not work, at least not with structural types as they are currently defined in Scala. The problem is that accessing a member of a structural types is always implemented in terms of accessing a field of method in a class, using Java reflection. For database access, this is not what we want; instead we would like to use the field name as a parameter for operation that’s defined by the system. In short, structural types are useless for database access because their member access implementation is not programmable.

The rest of this note describes a way to change that. It lays out a scheme to define programmatically the meaning of accessing a member of a structural type. The scheme is based on the idea of representing structural types programmatically, using “Selectables”. It is implemented in PR #1881.

Selectable is a trait defined as follows:

trait Selectable extends Any {
  def selectDynamic(name: String): Any
  def selectDynamicMethod(name: String, paramClasses: ClassTag[_]*): Any =
    new UnsupportedOperationException("selectDynamicMethod")
}

The principal method of a selectable is selectDynamic: It takes a field name and returns the value associated with that name in the selectable.

To make this precise, assume r is a value with structural type S. In general S is of the form C { Rs }, i.e. it consists of a class reference C and refinement declarations Rs. We call a field selection r.f structural if f is a name defined by a declaration in Rs whereas C defines no member of name f. Assuming the selection has type T, it is mapped to something equivalent to the following code:

(r: Selectable).selectDynamic("f").asInstanceOf[T]

That is, we make sure r conforms to type Selectable, potentially by adding an implicit conversion. We then invoke the get operation of that instance, passing the the name "f" as a parameter. We finally cast the resulting value back to the statically known type T.

Selectable also defines another access method called selectDynamicMethod. This operation is used to select methods instead of fields. It gets passed the class tags of the selected method’s formal parameter types as additional arguments. These can then be used to disambiguate one of several overloaded variants.

Package scala.reflect contains an implicit conversion which can map any value to a selectable that emulates reflection-based selection, in a way similar to what was done until now:

package scala.reflect

object Selectable {
  implicit def reflectiveSelectable(receiver: Any): scala.Selectable = receiver match {
    case receiver: scala.Selectable => receiver
    case _                          => new scala.reflect.Selectable(receiver)
  }
}

When imported, reflectiveSelectable provides a way to access fields of any structural type using Java reflection. This is similar to the current implementation of structural types. The main difference is that to get reflection-based structural access one now has to add an import like

import scala.relect.Selectable.reflectiveSelectable

On the other hand, the previously required language feature import of reflectiveCalls would now be redundant and can be dropped.

As you can see from its implementation above, reflectSelectable checks first whether its argument is already a run-time instance of Selectable, in which case it is returned directly. This means that reflection-based accesses only take place as a last resort, if no other Selectable is defined.

Other selectable instances can be defined in libraries. For instance, here is a simple class of records that support dynamic selection:

case class Record(elems: (String, Any)*) extends Selectable {
  def selectDynamic(name: String): Any = elems.find(_._1 == name).get._2
}

Record consists of a list of pairs of element names and values. Its selectDynamic operation finds the pair with given name and returns its value.

Let’s define a record value and cast it to a structural type Person:

type Person = Record { val name: String; val age: Int }
val person = Record(("name" -> "Emma", "age" -> 42)).asInstanceOf[Person]

(we get back to the issue of casting below, for now just note that the cast will succeed, as it checks at runtime only the erased portion of Person, which is Record).

Then person.name will have static type String, and will produce "Emma" as result.

The safety of this scheme relies on the correctness of the cast. If the cast lies about the structure of the record, the corresponding selectDynamic operation would fail. In practice, the cast would likely be part if a database access layer which would ensure its correctness.

It would be nice if the correctness of structural types could be ensured in a way less resembling pulling a rabbit our of your hat. Maybe this could be achieved by providing a language-defined bijection between structural types and a recursive generic type structure such as an HMap, i.e. an HList over pairs of labels and values. The idea is that one can define type-level operations over the generic type which implement data manipulations in a type-safe way. Structural types themselves do not lend themselves to recursive type-level operations because their fundamental shape is a set (of key/value pairs), not a recursive type such as a list. On the other hand, structural types naturally implement the natural subtyping one would expect for records, which HLists or HMaps cannot do.

Notes:

The scheme does not handle polymorphic methods in structural refinements. For now, such polymorphic methods are flagged as errors. It’s not clear whether the use case is common enough to warrant the additional complexity of supporting it.
There are clearly some connections with scala.Dynamic here, since both select members programmatically. But there are also some differences.

Fully dynamic selection is not typesafe, but structural selection is, as long as the correspondence of the structural type with the underlying value is as stated.
Dynamic is just a marker trait, which gives more leeway where and how to define reflective access operations. By contrast Selectable is a trait which declares the access operations.
One access operation, selectDynamic is shared between both approaches, but the other access operations are different. Selectable defines a selectDynamicMethod, which takes class tags indicating the method’s formal parameter types as additional argument. Dynamic comes with applyDynamic and updateDynamic methods, which take actual argument values.

It would be interesting to see whether we can arrive at a harmonization between the two schemes. If we only look at selectDynamic, this is easy: We can define a class

trait DynamicSelectable extends Selectable with Dynamic

So one selectDynamic operation can perform double duty for structural and dynamic dispatch. The differences between the other methods are a bit harder to bridge however.

Issue Analytics

State:
Created 7 years ago
Reactions:10
Comments:6 (5 by maintainers)

Top GitHub Comments

2reactions

ShaneDelmorecommented, Jan 27, 2017

I wonder if the reason structural types aren’t used more is the performance and cultural implications. I see people work around not wanting to use structural types with typeclasses, tuples, shapeless. I love the idea of being able to write a function that handles anything with a close function, without needing to write extra boilerplate for every new closeable I want to use it with. Someone will always show up to say what if someone misuses that feature on a class Relationship, where close indicates how near you are or something like that but in the end I want types that help me do my job and eliminate bugs and I have never actually had a bug caused by someone taking my generic reverse function and applying it to a vehicle.

1reaction

acdenhartogcommented, Feb 11, 2017

Perhaps it is worth pointing out that Spark used the stringy types so they could make use of an optimized SQL backend. They did not move away from RDD because they needed something more dynamic, it was really about performance. But they now have an experimental API called DataSet, which uses that same DataFrame and SQL backend but allows rows to be treated as arbitrary case classes. There is a casting step involved, but after that everything is type safe; or at least, in so far as the API has been implemented.

If this is about Spark then I suspect macros and implicit functions could be much more valuable to them. It would be nice to map things into Record as intermediate values instead of using a Tuple, but with this implementation that is just wasted performance. If the compiler just rewrites every Record as a Tuple, because the types are there, that would be nice. It has nothing to do with Dynamic, but it might be the kind of thing people actually want instead of just what they say they want.

Also, if you are going to do a Record syntax please remove those val keywords, those should be implied just like in case class. Although, in fact, they do not make any sense here because you cannot guarantee that field even exists, let alone that it is immutable! edit: completely misunderstood your Record example; in my opinion it would be more obvious with this equivalent: type Person = Selectable { val name: String; val age: Int } Having better understood it, I do like this whole Selectable idea.

edit: another possible cause of confusion: you are are reusing the method name selectDynamic, but not its common semantics. In your proposal, one could implement Selectable and selectDynamic for almost any class without having every possible method call on that class pass the type checker. i.e. Record(("name" -> "Emma", "age" -> 42)).selectWhatever does not compile.

@ShaneDelmore the next time I write a Vehicle class it is going extend Iterable[Carbon] so that I can reverse it, foreach over the tires, find passengers, override its generic companion, and filter the exhaust fumes or flatMap them into cancer patients. Thankfully, canBuildFrom means that could return a Set of patients, it is good for something after all.

Top Results From Across the Web

ReQL data types - RethinkDB

RethinkDB's basic data types include numbers, strings, boolean values, objects, arrays, and the null value. In addition, it stores RethinkDB-specific data ...

Rethink User Profile Types - Buildout Help Center

In this article you will learn about the three standard Profile types in Rethink and how to edit the permissions.

Business Entity Types 101: Is It Time To Rethink Your ... - Forbes

2. Limited liability company (LLC). The LLC business structure may be described as a bit of a cross between a sole proprietorship or...

A Typology of Potential Financing Structures for Population ...

1) Situations in which agreements about how costs are measured and how the savings are spent are made by the authorities;. 2) situations...

Navy to Rethink Force Structure Composition

It will look at the planned force mix and assess whether the service needs to adjust it for each type of ship, he...