question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimize (NOT) EXISTS joins

See original GitHub issue

I love and I am sometimes still surprised how well linq2db optimizes (dynamic) queries that sometimes contain large unused portions. 👏

It’s not critical but there is an opportunity to remove unused OUTER JOINs and OUTER APPLY from [NOT] EXISTS queries.

Here’s the full story:

I was doing some UPDATE based on queries in Oracle. Sadly Oracle doesn’t support FROM in UPDATE so linq2db has to duplicate the query into SET and WHERE EXISTS clauses, which is still much better than doing it yourself by hand! 😃

This resulted in the following (simplified) SQL:

UPDATE T1
SET T1.X = (/* select copied here once for each field */)
WHERE
	EXISTS(
		SELECT *
		FROM
			T1 ext_2
				LEFT JOIN T2 c_7 ON c_7.ISO = Substr(ext_2.SRC_BQE_BIC, 6, 2)
				LEFT JOIN T2 c_8 ON c_8.ISO = Substr(ext_2.DES_BQE_BIC, 6, 2)
				LEFT JOIN T3 c_9 ON c_9.TC = ext_2.TREA_CENT AND c_9.BIC = ext_2.SRC_BQE_BIC 
		WHERE
			ext_2.NOT_HANDLED = 'X' AND ext_2.TRANS_CHANNEL IS NULL AND
			T1.EXT_ACC_PMT_SEQ = ext_2.EXT_ACC_PMT_SEQ
	)

This is mechanically correct, the full query was copied into the WHERE EXISTS.

But if it was written by hand I wouldn’t have written a WHERE EXISTS at all and a tool like linq2db could notice the following optimization opportunities:

  1. Unused OUTER JOINS have cardinality 1…N and do not change the existence of a result, so they can be completely removed.

  2. Once removed, the only table left is ext_2 which is the updated table that was repeated here just for the LEFT JOIN. With no such joins left it can be removed and the exists is FROM DUAL and can be removed too.

After optimization, the WHERE could be just:

UPDATE T1
SET X = ( /* subquery */)
WHERE T1.NOT_HANDLED = 'X' AND T1.TRANS_CHANNEL IS NULL

That would be very neat and it would be what I’d written manually.

Bonus chatter: a good SQL plan optimizer would totally ignore the LEFT JOIN from its plan for the same reason linq2db could remove them. I have observed Oracle 12 execution plan and it did not drop the left joins but kept them in its plan anyway 😞 . So not only is the optimized request “nicer”, it would also be more efficient.

Environment details

linq2db version: 3.2.3 Database Server: Oracle 12 Database Provider: Managed ODP.NET Operating system: Win 10 .NET Framework: 5

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jods4commented, Feb 5, 2021

The gist of the reasoning is that inside an EXISTS, you only care whether the subquery has a result or not. An OUTER join never drops rows, only multiplies them. So an OUTER join cannot turn a non-existent result into a new result, nor make an existent result disappear. Hence if it’s not used in some other expression, it can be dropped from the query.

0reactions
sdanylivcommented, Feb 5, 2021

I need to think about this. Remember me after release, maybe I’ll find solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SQL - improve NOT EXISTS query performance
Try to replace the NOT EXISTS with a left outer join, it sometimes performs better in large data sets. Share.
Read more >
NOT IN vs NOT EXISTS vs LEFT JOIN vs EXCEPT
The NO IN command compares specific column values from the first table with another column values in the second table or a subquery...
Read more >
Why Not Exists Makes More Sense Than Left Joins For ...
Why Not Exists Makes More Sense Than Left Joins For Performance ... That means it can't be NULL, unless it's a non-matched row...
Read more >
Best practice between using LEFT JOIN or NOT EXISTS
Using NOT EXISTS it checks for the row but doesn't allocate space for the columns. Plus, it stops looking once it finds a...
Read more >
Tuning WHERE NOT EXISTS Tips
Answer: A where not exists clause is used to subtract one set of data ... EXISTS subquery can be re-written with a standard...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found