Ephemeral model bug: nested CTE cannot reference previous CTE
See original GitHub issueGiven a query like:
with my_first_cte as (
with my_sub_cte as (
select 1 as fun
)
select * from my_sub_cte
),
my_second_cte as (
with my_next_sub_cte as (
select * from my_first_cte
)
select * from my_next_sub_cte
)
select * from my_second_cte
Returns the following error:
ERROR processing query/statement. Error Code: 0, SQL state: org.apache.spark.sql.AnalysisException: Table or view not found: my_first_cte; line 17 pos 18
This prevents us from being able to have multiple ephemeral models in a dependency line, since the second CTE (ephemeral model) includes sub CTEs that reference the first CTE (ephemeral model).
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Ephemeral model bug: nested CTE cannot reference previous ...
This prevents us from being able to have multiple ephemeral models in a dependency line, since the second CTE (ephemeral model) includes sub...
Read more >How to re-use one CTE in another CTE in jOOQ - Stack Overflow
At first, I thought it might be a problem with the use of count(). From the manual, it looks like count() is being...
Read more >Best practices - dbt Developer Hub
Breaking the CTE into a separate model allows you to reference the model from any number of downstream models, reducing duplicated code. A...
Read more >dbt Guide - GitLab
This is intended to be performed on a model by model bases and for models with known performance needs; for example, the model...
Read more >Temporary Tables in SQL Server - Simple Talk
If the nested procedure references a temporary table and two temporary tables with the same name exist at that time, which table is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

This bug is actually resolved in Spark 3.x, which offers greater support for nested CTEs/subqueries. While it’s still an issue in Spark 2.x, I’m going to close this, as we don’t plan to make any code changes.
@aaronsteers fyi, check out https://github.com/fishtown-analytics/dbt/issues/1248 and https://github.com/fishtown-analytics/dbt/pull/1283
I think sql parsing + inlining CTEs is a cool idea - it would definitely give us the best of both worlds:
But, dbt doesn’t do very much SQL parsing at all, and we generally tend to bet on databases better supporting the SQL standard over time rather than trying to work around their current limitations.
I don’t think we have any immediate plans to do this type of SQL parsing/rewriting, but it could definitely be in scope for the future. For the moment, the shortest path to supporting ephemeral models is probably going to be a subquery-based implementation