LATERAL join
See original GitHub issueLATERAL join
Proposed experimental feature.
Work-in-progress: the issue description is being edited in-place.
A LATERAL join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the RHS in-scope during each RHS evaluation.
A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.
Another way to think of a lateral join is as a flatmap.
Examples:
## Get exactly one label for each subject in a row.
SELECT * {
?s ?p ?o
LATERAL {
SELECT * { ?s rdfs:label ?label } LIMIT 1
}
}
## Get zero or one labels for each subject.
SELECT * {
?s ?p ?o
LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1} }
}
{ OPTIONAL ... is the same as writing { {} OPTIONAL ....
{ } evaluates to the join identity, a table of one row of zero columns.
Syntax
The LATERAL keyword which has the graph pattern so far (from the { starting the current block) and a { } block afterwards.
Possible addition: LATERAL ( ?var1 ?var2 ...) to specify certain variables to expose to the RHS. Other variables would be (inner)joined as usual. This may be an unnecessary feature.
Scope
A sub-select may have variables of the same name that are not lateral-joined to a variable of the same name from the LHS.
SELECT * {
?s ?p ?o
LATERAL {
SELECT ?label { ?s rdfs:label ?label } LIMIT 1
}
}
The inner ?s in the SELECT ?label is not the outer ?s because the SELECT ?label does not pass out ?s. As a sub-query the ?s could be any name except ?label for the same results.
This is the same situation as a sub-query in other situations.
There needs to be a new syntax restriction: there can no variable introduced by AS (BIND, or sub-query) or VALUES in-scope at the top level of the LATERAL RHS, that is the same name as any in-scope variable from the LHS.
## ** Illegal **
SELECT * {
?s ?p ?o
LATERAL { BIND( 123 AS ?o) }
}
In ARQ, LET would work.
LET for a variable that is bound acts like a filter.
Evaluation
Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL
Notes
There is a similarity to filter NOT EXISTS/EXISTS expressed as the not-legal FILTER ( ASK { pattern } ) where the variables of the row being filtered are available to “pattern”. This is similar to ab SQL correlated subquery.
Elsewhere
-
Jena’s SERVICE loop:
-
Oxigraph: oxigraph/issues/267, oxigraph/pull/274
-
https://docs.stardog.com/query-stardog/stored-query-service#correlated-subqueries
-
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL
-
https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html
Spec updates
Syntax
LATERAL is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples). As a syntax form, it is similar to OPTIONAL.
[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57] OptionalGraphPattern ::= 'OPTIONAL' GroupGraphPattern
[ ] LateralGraphPattern ::= 'LATERAL' GroupGraphPattern
Algebra
The new algebra is operator is lateral which takes two expressions
SELECT * {
?s ?p ?o
LATERAL
{ ?a ?b ?c }
}
is translated to:
(lateral
(bgp (triple ?s ?p ?o))
(bgp (triple ?a ?b ?c)))
Evaluation
To evaluate lateral:
- Evaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings.
- For each solution mapping (“row”), inject variable bindings into the second argument Evaluate this pattern Add to results
Outline:
Definition: Lateral
Let Ω be a multiset of solution mappings. We define:
Lateral(Ω, P) = { μ | union of Ω1 where
foreach μ1 in Ω:
pattern2 = inject(pattern, μ1)
Ω1 = eval(D(G), pattern2)
result Ω1
}
where inject is the corrected substitute operation.
An alternative style is to define Lateral more like “evaluate P such that μ is in-scope” in some
way, rather than rely on inject which is a mechanism.
Definition: Evaluation of Lateral
eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)
Issue Analytics
- State:
- Created 10 months ago
- Reactions:3
- Comments:19 (15 by maintainers)

Top Related StackOverflow Question
Have to be careful here.
It’s the operator that determines the evaluation, there isn’t some policy for the whole expression. Just most current algebra operators are depth first evaluation (AKA functions) and we all say “evaluated bottom-up”.
(join A B)is bottom up becausejoinsays “evaluate the arguments separately then join the resulting tables”. It is the “evaluate the arguments then …” which makes it a well-behaved function; it is doing a depth first walk “bottom-up”.(lateral A B)says “eval A, then loop on its rows, evaluating B such that the row from A is available for variables”. Defining the “row being available to A” as careful injection means evalA[row]is normal SPARQL evaluation, not a special case for inside LATERAL.The proposal is that the row is
injected (by the correctedsubstituteoperation) - the variable name is still there but it’s binding is fixed by having aBINDjust before it. There are places that require a variable work e.g.SELECT ?varorFILTER(bound(?var))where replacing a variable by it’s value fails.There is a discussion point about whether “eval B with row from A” should or should not use the in-scope rules for variables:
Does the
?sin{ ?s rdfs:label ?label }connect to the?sbefore theLATERAL?From a SPARQL POV, that sub-query can otherwise be
SELECT ?label { ?z rdfs:label ?label } LIMIT 1orSELECT ?label { [] rdfs:label ?label } LIMIT 1for the same results.?labelis unrelated to the LHS triple because?sisn’t in theSELECT.SELECT ?s ?label { ?s rdfs:label ?label } LIMIT 1does make the?ssee the?sof?s ?p ?o. DittoSELECT * { ?s rdfs:label ?label } LIMIT 1.Just for
LATERAL, it could be “no scope rules” and the inner?sdoes see the LHS?s.At the moment, I’m more inclined to the scoping version so that there isn’t a eval special case of “inside LATERAL” and making developing big queries piece-by-piece more predicable (arguably), but it does cause a “surprise” case. Another reason is that special cases tend to have complicated consequences.
When/if we have query template and parameterization, unconditionally replacing
?sby an RDF term makes sense and easier for users to comprehend.For interoperability, it would be good if there was agreement for whether variable lists are unsupported, optional or mandatory. They are not strictly necessary, but someone may find this useful e.g. for extra control when copy/pasting a long graph pattern into the rhs of a lateral join - or maybe even just for clarity. Conversely, optional variable lists would give more brevity especially when dealing with simple graph patterns.
To me it seems an ‘optional’ variable list would be most convenient to use.
Also, proper lateral join support would solve an issue with the service enhancer which @LorenzBuehmann found out:
OPTIONAL { SERVICE <loop:> { X } }becomesOpConditionalwhich results in an exception during algebra-to-syntax reversal in OpAsQuery. So the service enhancer attempts to combineOPTIONAL { LATERAL {} }into one construct but that cannot be handled properly with the current machinery.SERVICE <loop:> { OPTIONAL { X } }but this has an undesired side effect when e.g. combining this withSERVICE <cache:>: The cache indexes results based on the parameters (algebra, joining input binding, row number). As a consequence, the graph patternXis of course considered different fromOPTIONAL { X }which possibly results in cache misses.So proper lateral is very welcome!