Add a generate_database_name() macro
See original GitHub issueDescribe the feature
In the current version (v14.0), there exists a {{ generate_schema_name_for_env }}
macro which works very well in dev mode, allowing a production run to write to the specified schema but writing all tables and views to a dev schema when in dev mode. In the same way we need a {{generate_database_name_for_env}}
macro for when a database is configured in the dbt_project.yml
file.
Currently:
If I have in my dbt_project.yml
file a section in models
that reads:
models:
product:
database: mart_db
materialized: view
schema: mart_schema
and in my profiles.yml
file I have:
my_dbt:
target: dev
outputs:
dev:
type: snowflake
account: *******
user: "{{ env_var('DBT_USER') }}"
password: "{{ env_var('DBT_PASSWORD') }}"
role: "ur_{{ env_var('DBT_USER') }}"
database: dbt_dev
Then my models in dev mode will be written using mart_db
instead of dev_db
.
Describe alternatives you’ve considered
Right now to solve this I’ve created alternative ref
macro called xref
to override this behavior but it feels a bit clunky to do this and I will have to tell out dbt devs to all use {{ xref('some table') }}
instead of the inbuilt ref
function.
Additional context
Not database specific, it’s a dbt issue.
Who will this benefit?
Anyone who wants to specify a set of production databases in their dbt_project.yml
file in the same way that they might already do for their schemas using the existing {{generate_database_name_for_env}}
macro but who also wants to have dbt write all tables and views into a single schema when in dev_mode.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:8 (3 by maintainers)
Let me clean up my xref macro a bit and add some comments, in the mean time I’ll try and explain this part
I have to override the normal ref() function and return instead the final production table name
a bit better:So just to preface this, what I describe below (and the condition in
xref()
) only happens whentarget.name!=prod
ANDenv_var('DEVELOPER_TYPE') == tier2
, otherwise dbt runs as normal.Normally in dbt you create a hierarchy of models, i.e.
[actual raw table] -> model_raw -> model_stage -> model_bizready -> model_mart
This assumes you have access to
[actual raw table]
, but in my case tier2 developers only have access to[production bizready table]
(created by a previous production run of dbt)So when a tier2 developer executes:
dbt -run -m +model_mart
dbt will try and execute the whole DAG, which will fail because tier2 can’t access the
[actual raw table]
. To work around this, myxref()
macro figures out thatxref(model_bizready)
should not keep traversing the DAG as normal, but instead should rewrite the DAG to reference the production table that the tier2 dev does have access to. So forxref(model_bizready)
the DAG becomes:[production bizready table] -> model_mart
That way the tier2 developer doesn’t need access to anything including or proceeding bizready because they don’t have access rights to it anyway, but they’re still able to contribute to and execute dbt for any
mart
models they wish to work on.Conversely a tier1 developer or a production process executing the same
dbt -run -m +model_mart
will execute the whole DAG normally i.e.[actual raw table] -> model_raw -> model_stage -> model_bizready -> model_mart
@drewbanin
Ok here’s what my xref macro is looking like:
all bizready tables are prefixed with
br_
, so a sample table might bebr_s1__some_dataset
,In prod mode or when
DEVELOPER_TYPE
isRAW_DEV
thenxref
just acts likeref
. but if developer mode isBIZREADY_DEV
and abr_
table is being referenced then it rewrites the table reference to point to production. Iftarget != prod
I always do the mapping because I need xref to fail if anyone adds a new bizready schema. Anyway hopefully the use case makes sense. The database name override isn’t the only piece I’d need to do this more elegantly, I’d also need to some how get the custom schema name from the config.Originally I was getting the custom schema name if the developer mode was
BIZREADY_DEV
but dbt creates the schemas first instead of doing it only if something is written to that schema and this meant that if two devs were running dbt at the same time, one would error out because dbt wouldn’t be able to create the custom schemas again (because the 1st dev would have created them)…that’s why I have the clumsy lookup table.Not sure if you have some better ways I might solve this use case, but being able to oevrride the
ref
function is probably the 1st step and then maybe having more ways look up information about a reference name, i.e. to see if it has a custom schema, custom database configured etc…?