Optimize (NOT) EXISTS joins
See original GitHub issueI love and I am sometimes still surprised how well linq2db optimizes (dynamic) queries that sometimes contain large unused portions. 👏
It’s not critical but there is an opportunity to remove unused OUTER JOINs and OUTER APPLY from [NOT] EXISTS queries.
Here’s the full story:
I was doing some UPDATE based on queries in Oracle. Sadly Oracle doesn’t support FROM in UPDATE so linq2db has to duplicate the query into SET and WHERE EXISTS clauses, which is still much better than doing it yourself by hand! 😃
This resulted in the following (simplified) SQL:
UPDATE T1
SET T1.X = (/* select copied here once for each field */)
WHERE
EXISTS(
SELECT *
FROM
T1 ext_2
LEFT JOIN T2 c_7 ON c_7.ISO = Substr(ext_2.SRC_BQE_BIC, 6, 2)
LEFT JOIN T2 c_8 ON c_8.ISO = Substr(ext_2.DES_BQE_BIC, 6, 2)
LEFT JOIN T3 c_9 ON c_9.TC = ext_2.TREA_CENT AND c_9.BIC = ext_2.SRC_BQE_BIC
WHERE
ext_2.NOT_HANDLED = 'X' AND ext_2.TRANS_CHANNEL IS NULL AND
T1.EXT_ACC_PMT_SEQ = ext_2.EXT_ACC_PMT_SEQ
)
This is mechanically correct, the full query was copied into the WHERE EXISTS.
But if it was written by hand I wouldn’t have written a WHERE EXISTS at all and a tool like linq2db could notice the following optimization opportunities:
-
Unused OUTER JOINS have cardinality 1…N and do not change the existence of a result, so they can be completely removed.
-
Once removed, the only table left is
ext_2
which is the updated table that was repeated here just for the LEFT JOIN. With no such joins left it can be removed and the exists is FROM DUAL and can be removed too.
After optimization, the WHERE could be just:
UPDATE T1
SET X = ( /* subquery */)
WHERE T1.NOT_HANDLED = 'X' AND T1.TRANS_CHANNEL IS NULL
That would be very neat and it would be what I’d written manually.
Bonus chatter: a good SQL plan optimizer would totally ignore the LEFT JOIN from its plan for the same reason linq2db could remove them. I have observed Oracle 12 execution plan and it did not drop the left joins but kept them in its plan anyway 😞 . So not only is the optimized request “nicer”, it would also be more efficient.
Environment details
linq2db version: 3.2.3 Database Server: Oracle 12 Database Provider: Managed ODP.NET Operating system: Win 10 .NET Framework: 5
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
The gist of the reasoning is that inside an EXISTS, you only care whether the subquery has a result or not. An OUTER join never drops rows, only multiplies them. So an OUTER join cannot turn a non-existent result into a new result, nor make an existent result disappear. Hence if it’s not used in some other expression, it can be dropped from the query.
I need to think about this. Remember me after release, maybe I’ll find solution.