Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using the type system to represent a recordset

See original GitHub issue

I am trying to represent a recordset / query result using the type system.

It will have 0 or more rows.
It will have 0 or more columns.
Values in the same column should always have the same type. Or, phrased alternatively, every record should conform to the same type.

The goal is to provide a library of functions to help dealing with these resultsets: e.g. remove a field, calculate a new field, join or union two resultsets, etc.

First attempt using lists of lists

The first attempt to describe this is using a List of List of optional values

enum Value:
  IntValue(i: Int)
  StringValue(s: String)
  
struct Resultset(records: List[List[Option[Value]]])

resultset = Resultset([
  [Some(StringValue("foo")), Some(IntValue(123)), None],
  [Some(StringValue("bar")), Some(IntValue(456)), Some(IntValue(789))]
])

This works, but there are definitely cons to this approach:

Excessive wrapping/boxing of values which makes it harder to do anything with it. Generally we would not define a resultset statically like this (it would come from a DB). But any resultset manipulation will require extremely cumbersome pattern matching, and handling of cases that we know will never occur.
We don’t actually use the type system but work around it. As a result, it doesn’t actually enforce things like every column having the same type, or that every row has the same number of fields.
The standard JSON encoding of this data structure is kind of iffy.

Second attempt attempt using lists of tuples

Rather than using lists for the individual rows, we could use tuples. This will enforce that every row will have the same number of fields, and that every column will have the same type.

struct Resultset(records: List[a])
resultset = Resultset([
  # A minor variation would be to use a struct for records, rather than a tuple
  ("foo", 123, None), 
  ("bar", 456, Some(789))
])

Unfortunately, this doesn’t really work for our use case, because:

It’s basically impossible to build generic functions that work on the Resultset type, because the type variable is completely undefined.
The type gets “fixed” the first time we use it, which means all the other resultsets in the same script will have to use the exact same record type, which generally is not the case in practice.

Questions

Am I missing something? Is it possible to use richer types than List[List[Option[Value]]]?
Could certain language feature additions make this better?

Issue Analytics

State:
Created 4 years ago
Comments:26 (10 by maintainers)

Top GitHub Comments

1reaction

johnynekcommented, Mar 27, 2019

Thanks for the issue. The issues of dealing with generic programming on statically typed records are real. This is a known pain point in Haskell and Scala.

To make the first approach a bit more palatable, I would note you could introduce some helpers:

enum Value:
  IntValue(i: Int)
  StringValue(s: String)
  
struct Resultset(records: List[List[Option[Value]]])

def str(s): Some(StringValue(s))
def int(i): Some(IntValue(i))

resultset = Resultset([
  [str("foo"), int(123), None],
  [str("bar"), int(456), int(789)]
])

You still have lost any guarantees about the type, and you are basically back to dynamic types with any function working on this.

@snoble you can add constraints like you want: proof that the type is either Int or String

struct Proof(convert: forall f. f[a] -> f[b])

#here's a nice puzzle:
def flip(prf: Proof[a, b]) -> Proof[b, a]: ???

# Value has one free type parameter: a
enum Value:
  IntV(v: Int, prf: Proof[Int, a])
  StringV(s: String, prf: Proof[String, a])

# here you know a == String, so we can make the proof, it is just identity
def str(s: String) -> Value[String]:
  StringV(s, Proof(\s -> s))

# here we see a == Int, so we can make the proof, it is just identity.
def int(i: Int) -> Value[Int]:
  IntV(i, Proof(\s -> s))

so, now you can work with Value[a] and convert to and from Int or String depending on the branch, and then get back to a.

I don’t know exactly what you want to do, but that might be helpful…

Now… onto what could be done to make it easier to program generically…

In Purescript has a notion of row-polymorphism. So, you can write functions that work on any struct that has at least some fields with some types:

def foo(rec: { age: Int, name: String | r }) -> String: ...

so, you can call foo for all types that have an age and name fields.

You can do this by hacking your own requirement into a struct:

struct Requirements(getAge: r -> Int, getName: r -> String)

def foo(hasReqs: Requirements[r], rec: r) -> String:

Then you specialize as needed:

struct FooRec(age: Int, name: String)
# using the syntax I will add in #188 
fooRecIsOk = Requirements(\Foo(a, _) -> a, \Foo(_, n) -> n)

# now we have a function r -> String
fooFn = foo(fooRecIsOkay)

We could do the same with bar:
struct Bar(ageWithAnotherName: Int, serial: Int, nm:  String)
barIsOk = Requirements(\Bar(a, _, _) -> a, \Bar(_, _, n) -> n)

fooBar = foo(barIsOkay)

That’s a bit manual to encode, but the type system should be strong enough to keep everything going.

So, to put it together, you would have something like:

struct ResultSet(items: List[a], itemIsOkay: Requirements[a])

where you have a list of a, and Requirements is proof that each a has the properties you want.

This is “dictionary passing” encoding of typeclasses @snoble

Does any of this help or is it at all interesting at least? 😉 I really appreciate you taking the time to share these concerns and I do hope we can find some nice (or at least nicer) solutions together.

0reactions

snoblecommented, Apr 3, 2019

That’s fascinating. Yeah, I ended up with the forall there because I couldn’t see how else to express the resulting type. I definitely would drive myself mad trying to do this in Scala.

I can definitely not think about this for a while and maybe another approach will appear. In the mean time I’ll write out the full API I was thinking of so that I can remember it in the future