question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use a correlated subquery?

See original GitHub issue

I’m trying to write a query with a subquery that references rows from the parent query. For example, say I want SQL for getting the latest weather value from a table of weather data that looks roughly like

SELECT * FROM weather w
WHERE 
  w.time >= '2023-02-01' AND w.time <= '2023-02-10' 
  AND w.timestamp = (
    SELECT MAX(timestamp) FROM weather inner 
    WHERE inner.location = w.location AND inner.instrument = w.instrument AND inner.time = 
w.time
  )

First I tried

let lBound,uBound = (DateTime(2023,02,01), DateTime(2023,02,10))
select { 
  for w in table<weather> do
  where (
    w.time >= lBound && w.time <= uBound && w.timestamp = subqueryOne (
      select { 
        for inner in table<weather> do
        where (inner.location = w.location && inner.instrument = w.instrument && inner.time = w.time)
        select (maxBy inner.timestamp)
      }
    )
  )
}

but that doesn’t let me use a select to start a subquery inside another one. So then I tried

let lBound,uBound = (DateTime(2023,02,01), DateTime(2023,02,10))
let selectInner = select
select { 
  for w in table<weather> do
  where (
    w.time >= lBound && w.time <= uBound && w.timestamp = subqueryOne (
      selectInner { 
        for inner in table<weather> do
        where (inner.location = w.location && inner.instrument = w.instrument && inner.time = w.time)
        select (maxBy inner.timestamp)
      }
    )
  )
}

which compiles, but gave me

System.NotImplementedException : The method or operation is not implemented. Stack Trace: at SqlHydra.Query.LinqExpressionVisitors.visit@212(FSharpFunc2 qualifyColumn, Expression exp, Query query) at SqlHydra.Query.LinqExpressionVisitors.visit@212(FSharpFunc2 qualifyColumn, Expression exp, Query query) at SqlHydra.Query.SelectBuilders.SelectBuilder2.Where[T](QuerySource2 state, Expression`1 whereExpression)

I thought I’d try

let lBound,uBound = (DateTime(2023,02,01), DateTime(2023,02,10))
let sub = select {
  for inner in table<weather> do
  groupBy (inner.location, inner.instrument, inner.time)
  select (inner.location, inner.instrument, inner.time, maxBy inner.timestamp)
}
select { 
  for w in table<weather> do
  where (
    w.time >= lBound && w.time <= uBound && 
    (w.location, w.instrument, w.time, w.timestamp) = subqueryOne sub
  )
}

which also gives me a NotImplementedException (not surprising, and I think there’s another open issue about using tuples in a where clause) and

let lBound,uBound = (DateTime(2023,02,01), DateTime(2023,02,10))
let sub = select {
  for inner in table<weather> do
  groupBy (inner.location, inner.instrument, inner.time)
  select (inner.location, inner.instrument, inner.time, maxBy inner.timestamp)
}
select { 
  for w in table<weather> do
  for (location, instrument, time, timestamp) in subqueryMany sub do
  where (
    w.time >= lBound && w.time <= uBound && 
    w.location = location && w.instrument = instrument && w.time = time && w.timestamp = timestamp
  )
}

but that fails to compile.

Any suggestions how to get this sort of query to work?

Issue Analytics

  • State:closed
  • Created 7 months ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
JordanMarrcommented, Feb 16, 2023

Here are a few options for getting correlated subqueries working:

Option 1

The subquery is still in its own function, and the parent table is passed into the subquery. (The naming of the passed in table, od, would matter since that is used to define the table alias, so it would need to match what is in the parent query.)

let maxOrderQty (od: Sales.SalesOrderDetail) = 
    select {
        for d in orderDetailTable do
        where (d.ProductID = od.ProductID)
        select (maxBy d.OrderQty)
    }

let! results = 
    select {
        for od in orderDetailTable do
        where (od.OrderQty = subqueryOne (maxOrderQty od))
        orderBy od.ProductID
        select (od.SalesOrderID, od.ProductID, od.OrderQty)
    }
    |> ctx.ReadAsync HydraReader.Read

The problem I hit with I ran into in my experiment branch is that the LinqExpressionVisitor would need to be able to actually evaluate that function to get the resulting SqlKata.Query, and I’m not sure that is possible. So that might be a dead-end… I’m not sure.

Option 2

One way to bypass this could be to create a new function, similar to table<>, that could be used in a separate subquery function to declaratively designate a parent table source without actually passing one in. Something like this:

let maxOrderQty = 
    let od = correlatedTable<Sales.SalesOrderDetail> // or maybe `parentTable`
    select {
        for d in orderDetailTable do
        where (d.ProductID = od.ProductID)
        select (maxBy d.OrderQty)
    }

let! results = 
    select {
        for od in orderDetailTable do
        where (od.OrderQty = subqueryOne maxOrderQty)
        orderBy od.ProductID
        select (od.SalesOrderID, od.ProductID, od.OrderQty)
    }
    |> ctx.ReadAsync HydraReader.Read

Note that the correlatedTable function would be similar to the table function, but its definition would need to return an instance of the table 'T itself instead of a QuerySource<'T>.

Option 3

The third option would be to allow nesting the query within the parent query, at which point, it should be able to access the parent od table. I have seen this done before on the Pulumi.FSharp.Extensions project, and I think I asked him how he did it in the issues forum, but I don’t remember:

bucket {
    name "bucket-example"
    acl  "private"

    bucketWebsite { 
        indexDocument "index.html"
    }
}

In this case, the subQueryOne and subQueryMany could be turned into nested CE builders of their own:

let! results = 
    select {
        for od in orderDetailTable do
        where (od.OrderQty = 
            subQueryOne {
                for d in orderDetailTable do
                where (d.ProductID = od.ProductID)
                select (maxBy d.OrderQty)
            }
        )
        orderBy od.ProductID
        select (od.SalesOrderID, od.ProductID, od.OrderQty)
    }
    |> ctx.ReadAsync HydraReader.Read

However, as I look at the buckets example from the Pulumi library, I don’t think it would work for us since we need the subquery to be within the where clause.

TBH, it seems to me that Option 2 is our best bet. It’s reasonably easy to understand, and should be easy to implement. What do you think?

1reaction
ntwilsoncommented, Feb 10, 2023

Thanks for the help on this! Having the workaround for a hand-written query is definitely helpful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SQL Correlated Subqueries
A correlated subquery is one way of reading every row in a table and comparing values in each row against related data. It...
Read more >
Correlated Subquery in SQL By Examples
Unlike a plain subquery, a correlated subquery is a subquery that uses the values from the outer query. Also, a correlated subquery may...
Read more >
SQL Correlated Subqueries
SQL Correlated Subqueries are used to select data from a table referenced in the outer query. The subquery is known as a correlated...
Read more >
Correlated subqueries
The subquery is correlated because the number that it produces depends on main.ship_date, a value that the outer SELECT produces. Thus, the subquery...
Read more >
Why do we need Correlated Subqueries in SQL
A correlated subquery is a subquery that depends on the outer query and is evaluated for each instance of the outer query. One...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found