Broaden the join types that can be used with UNNEST.
See original GitHub issueThis is the first time I’ve raised an issue in this repo, please let me know if this should be raised elsewhere / stated differently / other pre-requirements.
TL;DR: When using UNNEST, joins other than CROSS JOIN should be available to limit the rows being returned.
When expanding an array (using UNNEST and CROSS JOIN
), it is often necessary to limit the result set for each row with a subsequent where
clause. This can become a performance issue when the expanded array has many elements in it and there are many rows being unnested and returned before then being discarded.
Currently this has to be done as follows:
-- we only want items below 50
select name, item
from (
values
('Sarah', sequence(1, 10000000, 1)),
('Jose', Array[1, 10000000, 1])
) as results (name, items)
cross join unnest(items) as t (item)
where item < 50;
I want to be able to write:
-- we only want items below 50
select name, item
from (
values
('Sarah', sequence(1, 10000000, 1)),
('Jose', Array[1, 10000000, 1])
) as results (name, items)
left join unnest(items) as t (item) on item < 50;
or with inner join
.
Executing the above statement produces the following error:
UNNEST on other than the right side of CROSS JOIN is not supported
.
Is there a good reason for this restriction? I can’t find any discussion of this in the issues or on other forums and would suggest that the expected behaviour of using different joins is clear and should be supported.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:16
- Comments:11 (4 by maintainers)
Any chance that LEFT JOIN UNNEST will be added to Presto?
@JonNorman The
ON
part ofJOIN
defines the condition of the join. I don’t think SQL spec necessarily dictates the implementation. “Instead of expanding every row and then filtering it out, we would only be joining to specific values in the array and then returning those.” is an implementation optimization. We can potentially do this optimization without changing semantics. Theoretically query engine can decide that we can push downitem < 50
toUNNEST
inAs long as the query engine is smart enough. Presto currently has a quite naive (and inefficient) implementation of
UNNEST
. Whether we should optimize its performance should be discussed orthogonally from whether we should support more join types.