Incorrect result in certain situations with multiple joins with difference partition schemes
See original GitHub issueThis is a bug introduced by #12013. The result would be wrong if the following situation happens:
- The query uses
COALESCE(joinKey)
on top ofFULL OUTER JOIN
with equi-join. - The children of the FullJoin node uses a different hash function to compute the partition from the join keys. For example, hash is computed on
(a, constant)
and join key is justa
. - There is another
JOIN
with the result ofFULL OUTER JOIN
using equi-join on only the coalesced keys of theFULL OUTER JOIN
.
In such situation, the newly introduced optimization would assume that the result of the FULL OUTER JOIN
is already partitioned on COALESCE(a)
thus there’s no need for another shuffle before the next join. However, because the hash function is calculated on (a, constant)
, even if the data is “partitioned on a
” it would be on a different node as a hash function computed with just a
. Thus a shuffle would still be needed to produce correct result.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
SQL multiple joins for beginners with examples - SQLShack
Multiple joins can be described as follows; multiple join is a query that contains the same or different join types, which are used...
Read more >Prevent duplicate values in LEFT JOIN - sql - Stack Overflow
Two SQL LEFT JOINS produce incorrect result. More explanation there. Solution for your query: SELECT p.id, p.person_name, d.department_name, c.phone_number ...
Read more >CREATE PARTITION SCHEME (Transact-SQL) - Microsoft Learn
A. Creating a partition scheme that maps each partition to a different filegroup. The following example creates a partition function to ...
Read more >Join Event Streams - ksqlDB Documentation
Joining collections. You can use ksqlDB to merge streams of events in real time by using the JOIN statement, which has a SQL...
Read more >MySQL 8.0 Reference Manual :: 13.2.13.2 JOIN Clause
The NATURAL [LEFT] JOIN of two tables is defined to be semantically equivalent to an INNER JOIN or a LEFT JOIN with a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here’s an example that can produce the wrong plan. It’s harder to come up with a query that can produce meaningful & wrong results though:
@tooptoop4 Yes #12946 reintroduced
FULL OUTER JOIN + COALESCE
optimization and it should not have this bug. It will be released in 0.227.