question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LATERAL join

Proposed experimental feature.

Work-in-progress: the issue description is being edited in-place.

A LATERAL join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the RHS in-scope during each RHS evaluation.

A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.

Another way to think of a lateral join is as a flatmap.

Examples:

## Get exactly one label for each subject in a row.
SELECT * {
   ?s ?p ?o
   LATERAL {
     SELECT * { ?s rdfs:label ?label } LIMIT 1
   }
}
## Get zero or one labels for each subject.
SELECT * {
   ?s ?p ?o
   LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1} }
}

{ OPTIONAL ... is the same as writing { {} OPTIONAL .... { } evaluates to the join identity, a table of one row of zero columns.

Syntax

The LATERAL keyword which has the graph pattern so far (from the { starting the current block) and a { } block afterwards.

Possible addition: LATERAL ( ?var1 ?var2 ...) to specify certain variables to expose to the RHS. Other variables would be (inner)joined as usual. This may be an unnecessary feature.

Scope

A sub-select may have variables of the same name that are not lateral-joined to a variable of the same name from the LHS.

SELECT * {
   ?s ?p ?o
   LATERAL {
     SELECT ?label { ?s rdfs:label ?label } LIMIT 1
   }
}

The inner ?s in the SELECT ?label is not the outer ?s because the SELECT ?label does not pass out ?s. As a sub-query the ?s could be any name except ?label for the same results.

This is the same situation as a sub-query in other situations.

There needs to be a new syntax restriction: there can no variable introduced by AS (BIND, or sub-query) or VALUES in-scope at the top level of the LATERAL RHS, that is the same name as any in-scope variable from the LHS.

## ** Illegal **
SELECT * {
   ?s ?p ?o
   LATERAL { BIND( 123 AS ?o) }
}

See SPARQL Grammar note 12.

In ARQ, LET would work. LET for a variable that is bound acts like a filter.

Evaluation

Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL

Notes

There is a similarity to filter NOT EXISTS/EXISTS expressed as the not-legal FILTER ( ASK { pattern } ) where the variables of the row being filtered are available to “pattern”. This is similar to ab SQL correlated subquery.

Elsewhere

Spec updates

Syntax

LATERAL is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples). As a syntax form, it is similar to OPTIONAL.

[56]  	GraphPatternNotTriples	  ::=  	GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57]  	OptionalGraphPattern	  ::=  	'OPTIONAL' GroupGraphPattern
[  ]  	LateralGraphPattern	  ::=  	'LATERAL' GroupGraphPattern

Algebra

The new algebra is operator is lateral which takes two expressions

  SELECT * {
    ?s  ?p  ?o
    LATERAL
      { ?a  ?b  ?c }
}

is translated to:

  (lateral
    (bgp (triple ?s ?p ?o))
    (bgp (triple ?a ?b ?c)))

Evaluation

To evaluate lateral:

  • Evaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings.
  • For each solution mapping (“row”), inject variable bindings into the second argument Evaluate this pattern Add to results

Outline:

Definition: Lateral

Let Ω be a multiset of solution mappings. We define:

Lateral(Ω, P) = { μ | union of Ω1 where 
           foreach μ1 in Ω:
               pattern2 = inject(pattern, μ1)
               Ω1 = eval(D(G), pattern2)
	       result Ω1
	   }

where inject is the corrected substitute operation.

An alternative style is to define Lateral more like “evaluate P such that μ is in-scope” in some way, rather than rely on inject which is a mechanism.

Definition: Evaluation of Lateral

eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:3
  • Comments:19 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
afscommented, Nov 15, 2022

Does it mean that “SPARQL is evaluated bottom-up” would not be true anymore?

Have to be careful here.

It’s the operator that determines the evaluation, there isn’t some policy for the whole expression. Just most current algebra operators are depth first evaluation (AKA functions) and we all say “evaluated bottom-up”.

(join A B) is bottom up because join says “evaluate the arguments separately then join the resulting tables”. It is the “evaluate the arguments then …” which makes it a well-behaved function; it is doing a depth first walk “bottom-up”.

(lateral A B) says “eval A, then loop on its rows, evaluating B such that the row from A is available for variables”. Defining the “row being available to A” as careful injection means eval A[row] is normal SPARQL evaluation, not a special case for inside LATERAL.

The proposal is that the row is injected (by the corrected substitute operation) - the variable name is still there but it’s binding is fixed by having a BIND just before it. There are places that require a variable work e.g. SELECT ?var or FILTER(bound(?var)) where replacing a variable by it’s value fails.


There is a discussion point about whether “eval B with row from A” should or should not use the in-scope rules for variables:

SELECT * {
   ?s ?p ?o
   LATERAL {
     SELECT ?label { ?s rdfs:label ?label } LIMIT 1
   }
}

Does the ?s in { ?s rdfs:label ?label } connect to the ?s before the LATERAL?

From a SPARQL POV, that sub-query can otherwise be SELECT ?label { ?z rdfs:label ?label } LIMIT 1 or SELECT ?label { [] rdfs:label ?label } LIMIT 1 for the same results. ?label is unrelated to the LHS triple because ?s isn’t in the SELECT.

SELECT ?s ?label { ?s rdfs:label ?label } LIMIT 1 does make the ?s see the ?s of ?s ?p ?o. Ditto SELECT * { ?s rdfs:label ?label } LIMIT 1.

Just for LATERAL, it could be “no scope rules” and the inner ?s does see the LHS ?s.

At the moment, I’m more inclined to the scoping version so that there isn’t a eval special case of “inside LATERAL” and making developing big queries piece-by-piece more predicable (arguably), but it does cause a “surprise” case. Another reason is that special cases tend to have complicated consequences.

When/if we have query template and parameterization, unconditionally replacing ?s by an RDF term makes sense and easier for users to comprehend.

1reaction
Aklakancommented, Nov 13, 2022

Possible addition: LATERAL ( ?var1 ?var2 …)

For interoperability, it would be good if there was agreement for whether variable lists are unsupported, optional or mandatory. They are not strictly necessary, but someone may find this useful e.g. for extra control when copy/pasting a long graph pattern into the rhs of a lateral join - or maybe even just for clarity. Conversely, optional variable lists would give more brevity especially when dealing with simple graph patterns.

To me it seems an ‘optional’ variable list would be most convenient to use.

Also, proper lateral join support would solve an issue with the service enhancer which @LorenzBuehmann found out:

  • OPTIONAL { SERVICE <loop:> { X } } becomes OpConditional which results in an exception during algebra-to-syntax reversal in OpAsQuery. So the service enhancer attempts to combine OPTIONAL { LATERAL {} } into one construct but that cannot be handled properly with the current machinery.
  • The mitigation so far is SERVICE <loop:> { OPTIONAL { X } } but this has an undesired side effect when e.g. combining this with SERVICE <cache:>: The cache indexes results based on the parameters (algebra, joining input binding, row number). As a consequence, the graph pattern X is of course considered different from OPTIONAL { X } which possibly results in cache misses.

So proper lateral is very welcome!

Read more comments on GitHub >

github_iconTop Results From Across the Web

PostgreSQL's Powerful New Join Type: LATERAL - Heap
PostgreSQL 9.3 has a new join type! Lateral joins arrived without a lot of fanfare, but they enable some powerful new queries that...
Read more >
sql - What is the difference between a LATERAL JOIN and a ...
A LATERAL join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL join...
Read more >
SQL LATERAL JOIN - A Beginner's Guide - Vlad Mihalcea
LATERAL JOIN is a very useful feature. It allows you to encapsulate a given computation in a subquery and reuse it in the...
Read more >
Documentation: 15: 7.2. Table Expressions - PostgreSQL
The join condition determines which rows from the two source tables are ... A LATERAL item can appear at top level in the...
Read more >
LATERAL Join - Apache Drill
LATERAL Keyword that represents a lateral join. A lateral join is essentially a foreach loop in SQL. A lateral join combines the results...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found