question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

To allow specifying the gcp project for bigquery.jobuser in profile.yml

See original GitHub issue

Describe the feature

To allow dbt to process bigquery data in one project using query job from another project.

BigQuery-context We have different types of dbt runs in production ( hourly runs, daily runs, 10-minute runs) that process the same dataset. Some runs, i.e. 10-minute runs are more critical than the others and require dedicated BQ slot reservation for them.

To do that, it’s a common and necessary practice in BigQuery workload management to process Bigquery data in one project gcp_project_A.prod_dataset, using Bigquery slot resource from another project gcp_project_B

To do that, we would have a gcp-service-account-1 would have the bigquery.dataViewer permission to gcp_project_A.prod_dataset and bigquery.jobUser permission in gcp_project_B.

the python code below would process gcp_project_A.prod_dataset using query job from another gcp_project_B.

database = "gcp_project_A"
gcp_job_project = "gcp_project_B"

client = google.cloud.bigquery.Client(
    project=gcp_job_project,
    # credentials=creds,
    location="US"
)

sql = ("SELECT  count(*) as count "
        " FROM `{}.gcp_project_B.table_1`".format(database)
    )

query_job = client.query(sql)
query_result = query_job.result(timeout=20)
print (list(query_result))

dbt-context

Currently, there is no easy way in dbt for this gcp-service-account-1 to process gcp_project_A.prod_dataset using query job from another gcp_project_B.

because profile.yml only specifies one gcp project which is used as both project of the dataset as well as the project of the bq job user, as shown below.

    stage:
      type: bigquery
      project: gcp_project_A
      dataset: prod_dataset
      keyfile: /mnt/gcp-service-account-1.json
      method: service-account

And in the code below, the database (gcp_project_A) is also used as the project of bigquery.Client where the bq query job would run from.

https://github.com/fishtown-analytics/dbt/blob/fec0e31a25d5b922cb1833cffcb5095eb4ee642b/plugins/bigquery/dbt/adapters/bigquery/connections.py#L218-L220

suggested-solution

to add a new optional key variable in profile.yml, named, for instance, bq_job_project. If the variable bq_job_project is presented as below, use that variable to bigquery.Client.

    stage:
      type: bigquery
      project: gcp_project_A
      dataset: prod_dataset
     jobuser_project: gcp_project_B
      keyfile: /mnt/gcp-service-account-1.json
      method: service-account

Describe alternatives you’ve considered

Currently, the workaround is

  1. set project: gcp_project_B in profile.yml
  2. overwrite the generate_database_name() macro so that it generates gcp_project_A

It could work for us in the short run. However, when we later need generate_database_name() for what it’s intended for, to generate different custom_database_name for different models in the same dbt run, we would run out of options.

Additional context

this feature is bigquery-specific. I don’t think it’s relevant to other databases.

Who will this benefit?

Any team or project that requires more advanced bigquery workload management, BQ reservation allocations for dbt ETL loads

Are you interested in contributing this feature?

yes, I would be happy to make a PR for this.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:8
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jtcohen6commented, Aug 13, 2021

@hui-zheng Sorry I never picked this back up! It looks like someone else had the same idea in https://github.com/dbt-labs/dbt/issues/3708, and contributed the code for it, too.

I’ve become more amenable to this idea over the past several months. There’s a good chance it will be implemented after all.

0reactions
hui-zhengcommented, Mar 18, 2021

On a separate note,

Using a dedicated BQ slot project for running queries is a very BigQuery-specific concept. It makes sense to provide a good way to specify that in profile BigQuery connection. So that we don’t have to hijack the target.project in profile.yml for that purpose.

The purpose of target.project shall be reserved for defining the default path for data assets, that is, dbt models/tables, instead of defining the computation resource.

Please re-consider the original proposal of adding bq_job_project var into profile.yml.

Read more comments on GitHub >

github_iconTop Results From Across the Web

To allow specifying the gcp project for bigquery.jobuser in ...
prod_dataset using query job from another gcp_project_B . because profile.yml only specifies one gcp project which is used as both project of ...
Read more >
Introduction to IAM | BigQuery
When you assign roles at the organization and project level, you provide permission to run BigQuery jobs or to access all of a...
Read more >
BigQuery setup
The location to materialize resources (models, seeds, snapshots, etc), unless they specify a custom project / database config; The GCP project ...
Read more >
Google Cloud Platform Tutorial: From Zero to Hero with GCP
You define your configuration in YAML files, listing the resources (created through API calls) you want to create and their properties.
Read more >
BigQuery
Field Type Default incremental_lineage boolean True sql_parser_use_external_process boolean False env string PROD
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found