Rethink Structural Types
See original GitHub issueOriginally, structural types were introduced to make the language fit better the underlying foundations (type theorists prefer structural), and to emulate the idea of “duck typing” in dynamic languages but with static guarantees. But it turned out that almost nobody uses them. It seems that a combination of traits and classes, together with type classes represented by implicit parameters gives enough flexibility, so the need for duck typing is rarely felt.
However, there is another area where statically-typed languages are often more awkward than dynamically-typed ones: database access. In a dynamically typed language, it’s quite natural to model a row as a record or object, and to select entries with simple dot notation, e.g. row.columnName
. In a statically typed language. we can do that only if we somehow define a class for every possible row arising from a data-base manipulation (including rows arising from joins and projections), and set up a scheme to map between a row and the class representing it. This requires a lot of boilerplate code. So quite often one opts for a simpler scheme where column names are represented as strings that are passed to a select operator, e.g. row.select("columnName")
. But this forgoes all the advantages of static typing and additionally is more awkward to write than the dynamically typed version.
A case in point is the Spark framework. The first version of Spark essentially supported distributed collections using RDDs. Used from Scala, this was very natural, an RDD was just some kind of collection, and was accessed in the same way as other Scala collections. Collection elements were defined by classes which were mapped transparently to database rows.
Later versions of Spark added database schemas (“data frames”) for better optimizations and multi-language support. But, sadly, this meant that some amount of type safety was lost and member access was now done via strings instead of the more natural dot notation.
It seems the most natural type to represent a row in a database scheme is a structural type, with one field for each column. But unfortunately this does not work, at least not with structural types as they are currently defined in Scala. The problem is that accessing a member of a structural types is always implemented in terms of accessing a field of method in a class, using Java reflection. For database access, this is not what we want; instead we would like to use the field name as a parameter for operation that’s defined by the system. In short, structural types are useless for database access because their member access implementation is not programmable.
The rest of this note describes a way to change that. It lays out a scheme to define programmatically the meaning of accessing a member of a structural type. The scheme is based on the idea of representing structural types programmatically, using “Selectables”. It is implemented in PR #1881.
Selectable
is a trait defined as follows:
trait Selectable extends Any {
def selectDynamic(name: String): Any
def selectDynamicMethod(name: String, paramClasses: ClassTag[_]*): Any =
new UnsupportedOperationException("selectDynamicMethod")
}
The principal method of a selectable is selectDynamic
: It takes a field name and returns the value associated with that name in the selectable.
To make this precise, assume r
is a value with structural type S
. In general S
is of the form C { Rs }
, i.e. it consists of a class reference C
and refinement declarations Rs
. We call a field selection r.f
structural if f
is a name defined by a declaration in Rs
whereas C
defines no member of name f
. Assuming the selection has type T
, it is mapped to something equivalent to the following code:
(r: Selectable).selectDynamic("f").asInstanceOf[T]
That is, we make sure r
conforms to type Selectable
, potentially by adding an implicit conversion. We then invoke the get
operation of that instance, passing the the name "f"
as a parameter. We finally cast the resulting value back to the statically known type T
.
Selectable
also defines another access method called selectDynamicMethod
. This operation is used to select methods instead of fields. It gets passed the class tags of the selected method’s formal parameter types as additional arguments. These can then be used to disambiguate one of several overloaded variants.
Package scala.reflect
contains an implicit conversion which can map any value to a selectable that emulates reflection-based selection, in a way similar to what was done until now:
package scala.reflect
object Selectable {
implicit def reflectiveSelectable(receiver: Any): scala.Selectable = receiver match {
case receiver: scala.Selectable => receiver
case _ => new scala.reflect.Selectable(receiver)
}
}
When imported, reflectiveSelectable
provides a way to access fields of any structural type using Java reflection. This is similar to the current implementation of structural types. The main difference is that to get reflection-based structural access one now has to add an import like
import scala.relect.Selectable.reflectiveSelectable
On the other hand, the previously required language feature import of reflectiveCalls
would now be redundant and can be dropped.
As you can see from its implementation above, reflectSelectable
checks first whether its argument is already a run-time instance of Selectable
, in which case it is returned directly. This means that reflection-based accesses only take place as a last resort, if no other Selectable
is defined.
Other selectable instances can be defined in libraries. For instance, here is a simple class of records that support dynamic selection:
case class Record(elems: (String, Any)*) extends Selectable {
def selectDynamic(name: String): Any = elems.find(_._1 == name).get._2
}
Record
consists of a list of pairs of element names and values. Its selectDynamic
operation finds the pair with given name and returns its value.
Let’s define a record value and cast it to a structural type Person
:
type Person = Record { val name: String; val age: Int }
val person = Record(("name" -> "Emma", "age" -> 42)).asInstanceOf[Person]
(we get back to the issue of casting below, for now just note that the cast will succeed, as it checks at runtime only the erased portion of Person
, which is Record
).
Then person.name
will have static type String
, and will produce "Emma"
as result.
The safety of this scheme relies on the correctness of the cast. If the cast lies about the structure of the record, the corresponding selectDynamic
operation would fail. In practice, the cast would likely be part if a database access layer which would ensure its correctness.
It would be nice if the correctness of structural types could be ensured in a way less resembling pulling a rabbit our of your hat. Maybe this could be achieved by providing a language-defined bijection between structural types and a recursive generic type structure such as an HMap
, i.e. an HList
over pairs of labels and values. The idea is that one can define type-level operations over the generic type which implement data manipulations in a type-safe way. Structural types themselves do not lend themselves to recursive type-level operations because their fundamental shape is a set (of key/value pairs), not a recursive type such as a list. On the other hand, structural types naturally implement the natural subtyping one would expect for records, which HLists or HMaps cannot do.
Notes:
-
The scheme does not handle polymorphic methods in structural refinements. For now, such polymorphic methods are flagged as errors. It’s not clear whether the use case is common enough to warrant the additional complexity of supporting it.
-
There are clearly some connections with
scala.Dynamic
here, since both select members programmatically. But there are also some differences.
-
Fully dynamic selection is not typesafe, but structural selection is, as long as the correspondence of the structural type with the underlying value is as stated.
-
Dynamic
is just a marker trait, which gives more leeway where and how to define reflective access operations. By contrastSelectable
is a trait which declares the access operations. -
One access operation,
selectDynamic
is shared between both approaches, but the other access operations are different.Selectable
defines aselectDynamicMethod
, which takes class tags indicating the method’s formal parameter types as additional argument.Dynamic
comes withapplyDynamic
andupdateDynamic
methods, which take actual argument values.
It would be interesting to see whether we can arrive at a harmonization between the two schemes. If we only look at selectDynamic
, this is easy: We can define a class
trait DynamicSelectable extends Selectable with Dynamic
So one selectDynamic
operation can perform double duty for structural and dynamic dispatch. The differences between the other methods are a bit harder to bridge however.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:10
- Comments:6 (5 by maintainers)
Top GitHub Comments
I wonder if the reason structural types aren’t used more is the performance and cultural implications. I see people work around not wanting to use structural types with typeclasses, tuples, shapeless. I love the idea of being able to write a function that handles anything with a close function, without needing to write extra boilerplate for every new closeable I want to use it with. Someone will always show up to say what if someone misuses that feature on a class Relationship, where close indicates how near you are or something like that but in the end I want types that help me do my job and eliminate bugs and I have never actually had a bug caused by someone taking my generic reverse function and applying it to a vehicle.
Perhaps it is worth pointing out that Spark used the stringy types so they could make use of an optimized SQL backend. They did not move away from
RDD
because they needed something more dynamic, it was really about performance. But they now have an experimental API calledDataSet
, which uses that sameDataFrame
and SQL backend but allows rows to be treated as arbitrary case classes. There is a casting step involved, but after that everything is type safe; or at least, in so far as the API has been implemented.If this is about Spark then I suspect macros and implicit functions could be much more valuable to them.
It would be nice to map things intoRecord
as intermediate values instead of using aTuple
, but with this implementation that is just wasted performance. If the compiler just rewrites everyRecord
as aTuple
, because the types are there, that would be nice. It has nothing to do withDynamic
, but it might be the kind of thing people actually want instead of just what they say they want.Also, if you are going to do aedit: completely misunderstood yourRecord
syntax please remove thoseval
keywords, those should be implied just like incase class
. Although, in fact, they do not make any sense here because you cannot guarantee that field even exists, let alone that it is immutable!Record
example; in my opinion it would be more obvious with this equivalent:type Person = Selectable { val name: String; val age: Int }
Having better understood it, I do like this wholeSelectable
idea.edit: another possible cause of confusion: you are are reusing the method name
selectDynamic
, but not its common semantics. In your proposal, one could implementSelectable
andselectDynamic
for almost any class without having every possible method call on that class pass the type checker. i.e.Record(("name" -> "Emma", "age" -> 42)).selectWhatever
does not compile.@ShaneDelmore the next time I write a
Vehicle
class it is going extendIterable[Carbon]
so that I can reverse it, foreach over the tires, find passengers, override its generic companion, and filter the exhaust fumes or flatMap them into cancer patients. Thankfully, canBuildFrom means that could return aSet
of patients, it is good for something after all.