question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Avoid cartesian product without query splitting

See original GitHub issue

Query splitting was introduced, but I would personally avoid its drawbacks like the plague.

Luckily, we can usually avoid cartesian products through careful aggregate design. Still, sometimes we truly do want to join multiple 1:N relations. Of course, we would rather not risk a huge data set caused by an incidental high number of child entities.

This problem can be solved without resorting to query splitting.

As an example, say we are selecting one Parent and joining both its Sons and its Daughters.

Basically, since the joined siblings are independent, we have no reason to want them multiplied. This can be accomplished by explicitly instructing the database to keep them separate:

  • Join a set of constants. Since we intend to join two 1:N relations, we will join two constants: LEFT JOIN (SELECT 1 AS Id UNION ALL SELECT 2) AS Splitter.
  • To the Son’s join condition, add: AND Splitter.Id = 1.
  • To the Daughter’s join condition, add: AND Splitter.Id = 2.

This gives us:

SELECT *
FROM Parents p
LEFT JOIN (SELECT 1 AS Id UNION ALL SELECT 2) AS Splitter ON TRUE
LEFT JOIN Sons s ON s.ParentId = p.Id AND Splitter.Id = 1
LEFT JOIN Daughters d ON d.ParentId = p.Id AND Splitter.Id = 2
WHERE p.Id = 1

While transferring a bit more data (mostly in the form of duplication of the Parent), the duplication stays linear and well under control.

When we combine careful aggregate design with avoiding the cartesian product, we have all the tools we need to load reasonable object graphs without introducing the significant drawbacks of split queries.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:8
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
Emillcommented, Mar 1, 2021

An approach similar to this is already used in EF6. It works like this.

The case is where there is one parent table and at least two child tables.

Instead of doing

SELECT
parent.id AS parent_id,
parent.col1 AS parent_col1,
son.id AS son_id,
son.col1 AS son_col1,
daughter.id AS daughter_id,
daughter.col1 AS daughter_col1
FROM parent
LEFT JOIN son ON parent.id = son.parent_id
LEFT JOIN daughter ON parent.id = daughter.parent_id
ORDER BY parent.id, son.id, daughter.id

it does the following

SELECT parent_id, parent_col1, son_id, son_col1, daughter_id, daughter_col1, c1
FROM (
    (SELECT CASE WHEN (son.id IS NULL) THEN (NULL) ELSE (1) END AS c1,
    parent.id AS parent_id,
    parent.col1 AS parent_col1,
    son.id AS son_id,
    son.col1 AS son_col1,
    NULL AS daughter_id,
    NULL AS daughter_col1
    FROM parent LEFT JOIN son ON parent.id = son.parent_id)
  UNION ALL
    (SELECT 2 AS c1
    parent.id AS parent_id,
    parent.col1 AS parent_col1,
    NULL AS son_id,
    NULL AS son_col1,
    daughter.id AS daughter_id,
    daughter.col1 AS daughter_col1,
    FROM parent INNER JOIN daughter ON parent.id = daughter.parent_id)
) AS t
ORDER BY parent_id, c1

Don’t know why it fetches all the parent columns the second time though, could easily be replaced by nulls (except for the identifier).

2reactions
smitpatelcommented, Sep 14, 2020

We must compare the best possible case to the cartesian product case

That is my point and has been said by @roji at multiple point. This works really well for some scenarios but not all and we need to evaluate pros and cons of both sides.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sql avoid cartesian product
I'm afraid this scenario will pretty much always leave you with a result like this. Try splitting the query in two parts: Part...
Read more >
SQL Query to Avoid Cartesian Product
As data is available in multiple tables, if SQL query is not written in an efficient manner, in major scenarios, a Cartesian product...
Read more >
Query and Reporting : Joining Tables
To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching...
Read more >
Single vs. Split Queries - EF Core
Characteristics of split queries ... While split query avoids the performance issues associated with JOINs and cartesian explosion, it also has ...
Read more >
How to Avoid Cartesian Explosion while using EF Core
Try to avoid blindly using Includes in your queries. It can lead to performance issues. Take advantage of EF Core features like Split...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found