Avoid cartesian product without query splitting
See original GitHub issueQuery splitting was introduced, but I would personally avoid its drawbacks like the plague.
Luckily, we can usually avoid cartesian products through careful aggregate design. Still, sometimes we truly do want to join multiple 1:N relations. Of course, we would rather not risk a huge data set caused by an incidental high number of child entities.
This problem can be solved without resorting to query splitting.
As an example, say we are selecting one Parent
and joining both its Son
s and its Daughter
s.
Basically, since the joined siblings are independent, we have no reason to want them multiplied. This can be accomplished by explicitly instructing the database to keep them separate:
- Join a set of constants. Since we intend to join two 1:N relations, we will join two constants:
LEFT JOIN (SELECT 1 AS Id UNION ALL SELECT 2) AS Splitter
. - To the
Son
’s join condition, add:AND Splitter.Id = 1
. - To the
Daughter
’s join condition, add:AND Splitter.Id = 2
.
This gives us:
SELECT *
FROM Parents p
LEFT JOIN (SELECT 1 AS Id UNION ALL SELECT 2) AS Splitter ON TRUE
LEFT JOIN Sons s ON s.ParentId = p.Id AND Splitter.Id = 1
LEFT JOIN Daughters d ON d.ParentId = p.Id AND Splitter.Id = 2
WHERE p.Id = 1
While transferring a bit more data (mostly in the form of duplication of the Parent
), the duplication stays linear and well under control.
When we combine careful aggregate design with avoiding the cartesian product, we have all the tools we need to load reasonable object graphs without introducing the significant drawbacks of split queries.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:16 (8 by maintainers)
An approach similar to this is already used in EF6. It works like this.
The case is where there is one parent table and at least two child tables.
Instead of doing
it does the following
Don’t know why it fetches all the parent columns the second time though, could easily be replaced by nulls (except for the identifier).
That is my point and has been said by @roji at multiple point. This works really well for some scenarios but not all and we need to evaluate pros and cons of both sides.