LATERAL join
See original GitHub issueLATERAL join
Proposed experimental feature.
Work-in-progress: the issue description is being edited in-place.
A LATERAL
join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL
keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the RHS in-scope during each RHS evaluation.
A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.
Another way to think of a lateral join is as a flatmap
.
Examples:
## Get exactly one label for each subject in a row.
SELECT * {
?s ?p ?o
LATERAL {
SELECT * { ?s rdfs:label ?label } LIMIT 1
}
}
## Get zero or one labels for each subject.
SELECT * {
?s ?p ?o
LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1} }
}
{ OPTIONAL ...
is the same as writing { {} OPTIONAL ...
.
{ }
evaluates to the join identity, a table of one row of zero columns.
Syntax
The LATERAL
keyword which has the graph pattern so far (from the {
starting the current block) and a { }
block afterwards.
Possible addition: LATERAL ( ?var1 ?var2 ...)
to specify certain variables to expose to the RHS. Other variables would be (inner)joined as usual. This may be an unnecessary feature.
Scope
A sub-select may have variables of the same name that are not lateral-joined to a variable of the same name from the LHS.
SELECT * {
?s ?p ?o
LATERAL {
SELECT ?label { ?s rdfs:label ?label } LIMIT 1
}
}
The inner ?s
in the SELECT ?label
is not the outer ?s
because the SELECT ?label
does not pass out ?s
. As a sub-query the ?s
could be any name except ?label
for the same results.
This is the same situation as a sub-query in other situations.
There needs to be a new syntax restriction: there can no variable introduced by AS
(BIND
, or sub-query) or VALUES
in-scope at the top level of the LATERAL
RHS, that is the same name as any in-scope variable from the LHS.
## ** Illegal **
SELECT * {
?s ?p ?o
LATERAL { BIND( 123 AS ?o) }
}
In ARQ, LET would work.
LET
for a variable that is bound acts like a filter.
Evaluation
Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL
Notes
There is a similarity to filter NOT EXISTS
/EXISTS
expressed as the not-legal FILTER ( ASK { pattern } )
where the variables of the row being filtered are available to “pattern”. This is similar to ab SQL correlated subquery.
Elsewhere
-
Jena’s SERVICE loop:
-
Oxigraph: oxigraph/issues/267, oxigraph/pull/274
-
https://docs.stardog.com/query-stardog/stored-query-service#correlated-subqueries
-
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL
-
https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html
Spec updates
Syntax
LATERAL
is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples)
. As a syntax form, it is similar to OPTIONAL
.
[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57] OptionalGraphPattern ::= 'OPTIONAL' GroupGraphPattern
[ ] LateralGraphPattern ::= 'LATERAL' GroupGraphPattern
Algebra
The new algebra is operator is lateral
which takes two expressions
SELECT * {
?s ?p ?o
LATERAL
{ ?a ?b ?c }
}
is translated to:
(lateral
(bgp (triple ?s ?p ?o))
(bgp (triple ?a ?b ?c)))
Evaluation
To evaluate lateral
:
- Evaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings.
- For each solution mapping (“row”), inject variable bindings into the second argument Evaluate this pattern Add to results
Outline:
Definition: Lateral
Let Ω be a multiset of solution mappings. We define:
Lateral(Ω, P) = { μ | union of Ω1 where
foreach μ1 in Ω:
pattern2 = inject(pattern, μ1)
Ω1 = eval(D(G), pattern2)
result Ω1
}
where inject
is the corrected substitute
operation.
An alternative style is to define Lateral more like “evaluate P such that μ is in-scope” in some
way, rather than rely on inject
which is a mechanism.
Definition: Evaluation of Lateral
eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)
Issue Analytics
- State:
- Created 10 months ago
- Reactions:3
- Comments:19 (15 by maintainers)
Top GitHub Comments
Have to be careful here.
It’s the operator that determines the evaluation, there isn’t some policy for the whole expression. Just most current algebra operators are depth first evaluation (AKA functions) and we all say “evaluated bottom-up”.
(join A B)
is bottom up becausejoin
says “evaluate the arguments separately then join the resulting tables”. It is the “evaluate the arguments then …” which makes it a well-behaved function; it is doing a depth first walk “bottom-up”.(lateral A B)
says “eval A, then loop on its rows, evaluating B such that the row from A is available for variables”. Defining the “row being available to A” as careful injection means evalA[row]
is normal SPARQL evaluation, not a special case for inside LATERAL.The proposal is that the row is
inject
ed (by the correctedsubstitute
operation) - the variable name is still there but it’s binding is fixed by having aBIND
just before it. There are places that require a variable work e.g.SELECT ?var
orFILTER(bound(?var))
where replacing a variable by it’s value fails.There is a discussion point about whether “eval B with row from A” should or should not use the in-scope rules for variables:
Does the
?s
in{ ?s rdfs:label ?label }
connect to the?s
before theLATERAL
?From a SPARQL POV, that sub-query can otherwise be
SELECT ?label { ?z rdfs:label ?label } LIMIT 1
orSELECT ?label { [] rdfs:label ?label } LIMIT 1
for the same results.?label
is unrelated to the LHS triple because?s
isn’t in theSELECT
.SELECT ?s ?label { ?s rdfs:label ?label } LIMIT 1
does make the?s
see the?s
of?s ?p ?o
. DittoSELECT * { ?s rdfs:label ?label } LIMIT 1
.Just for
LATERAL
, it could be “no scope rules” and the inner?s
does see the LHS?s
.At the moment, I’m more inclined to the scoping version so that there isn’t a eval special case of “inside LATERAL” and making developing big queries piece-by-piece more predicable (arguably), but it does cause a “surprise” case. Another reason is that special cases tend to have complicated consequences.
When/if we have query template and parameterization, unconditionally replacing
?s
by an RDF term makes sense and easier for users to comprehend.For interoperability, it would be good if there was agreement for whether variable lists are unsupported, optional or mandatory. They are not strictly necessary, but someone may find this useful e.g. for extra control when copy/pasting a long graph pattern into the rhs of a lateral join - or maybe even just for clarity. Conversely, optional variable lists would give more brevity especially when dealing with simple graph patterns.
To me it seems an ‘optional’ variable list would be most convenient to use.
Also, proper lateral join support would solve an issue with the service enhancer which @LorenzBuehmann found out:
OPTIONAL { SERVICE <loop:> { X } }
becomesOpConditional
which results in an exception during algebra-to-syntax reversal in OpAsQuery. So the service enhancer attempts to combineOPTIONAL { LATERAL {} }
into one construct but that cannot be handled properly with the current machinery.SERVICE <loop:> { OPTIONAL { X } }
but this has an undesired side effect when e.g. combining this withSERVICE <cache:>
: The cache indexes results based on the parameters (algebra, joining input binding, row number). As a consequence, the graph patternX
is of course considered different fromOPTIONAL { X }
which possibly results in cache misses.So proper lateral is very welcome!