dbt models that are parquet files
See original GitHub issueI haven’t thought through this deeply. It might not make sense, it might require changes to dbt, or it might already work? But I wanted to raise it just in case, because it would help me out with something I’m building.
Could we enable configuring dbt-duckdb such that
select
bla
from {{ ref("orders") }}
compiles to
select
bla
from 's3://bucket/orders.parquet'
?
Issue Analytics
- State:
- Created a year ago
- Comments:14 (6 by maintainers)
Top Results From Across the Web
Using external parquet tables in a DBT pipeline - Stack Overflow
I'm trying to set up a simple DBT pipeline that uses a parquet tables stored on Azure Data Lake Storage and creates another...
Read more >Apache Spark configurations | dbt Developer Hub
The file format to use when creating tables ( parquet , delta , hudi , csv , json , text , jdbc ,...
Read more >Can DBT write to local parquet files? : r/dataengineering - Reddit
Hi - I could not find the answer to this - but can dbt base its data warehouse around parquet files on local...
Read more >Parquet Files ETL | Open-source Data Integration - Airbyte
The Airbyte Parquet Files ELT data integration connector will replicate your Parquet Files to your data warehouse, data lake or database.
Read more >jwills/dbt-duckdb - GitHub
It is crazy fast and allows you to read and write data stored in CSV and Parquet files directly, without requiring you to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I wired up a version of this idea here: https://github.com/jwills/dbt-duckdb/compare/jwills_file_based_dbs?expand=1
I got things working, but I didn’t feel great about it. My “idea” was to take advantage of the fact that the
database
config parameter is a no-op for DuckDB so that if you specify a path instead of the default (main
), I do some hacks to treat any models under that database + schema path as parquet/CSV files-- including when youref
them in other models (so for example a ref’d model that uses theparquet
materialization will be rendered when it is queried as<database>/<schema>/<model>.parquet
instead ofdatabase.schema.model
.I like what @tomsej is saying better tho (i.e., the
parquet
materialization acts like aview
over a parquet file, where the location of where the parquet file(s) should be materialized is specified…somewhere?), b/c it keeps the metadata catalog where it belongs-- inside of DuckDB, and not externally managed via dbt-duckdb + the filesystem. Radek’s approach means that we don’t have to jump through a whole bunch of hoops inside of theDuckDBAdapter
andDuckDBRelation
classes (as I do in the above branch) to render the relation differently when it’s a parquet/csv file instead of a regular table/view. To me, that makes theparquet
materialization into some syntactic sugar that is equivalent to materializing a view model that has a post-hook which does theCOPY (SELECT * FROM {{ this }}) TO '/path/to/output.parquet'
for us, and I would be 👍 for such a materialization (and maybe acsv
one that did the same sort of thing if we were so inclined?)I was thinking about the possible solutions for this too. Think this is only viable for
table
materializations:parquet
does not support updates (unlike iceberg or delta).I was thinking about introducing a new type of materialization (or adding some extra parameters to the current), e.g.
parquet
. With that, the last steps (usually something likeCREATE TABLE ...
) of the table materializations would be insteadCOPY (SELECT ...) TO '<location parameter>.parquet'
andCREATE VIEW AS SELECT FROM <location parameter>.parquet
so thetable
would not be actual table but a view on theparquet
. Think this is similar to what @AlexanderVR is doing. Any thoughts?