Support BigQuery Job Tags and Labels
See original GitHub issueDescribe the feature
I would like to be able to control tagging and labeling of BigQuery Jobs as I run dbt on BigQuery.
A similar (but not the same) issue is #1947, for labeling BigQuery Tables and Datasets. This issue focuses on BigQuery Jobs (such as Insert Jobs or Query Jobs).
Describe alternatives you’ve considered
It’s not possible to label or tag jobs after they have started. From the docs
You cannot add labels to or update labels on pending, running, or completed jobs.
Additional context
The main reason why one would tag and label their BigQuery Job is to analyze BigQuery spend. For example, if one were able to link a BigQuery Job to a certain Airflow operator run (or similar – in my case a python script run by a cron! 😄) then a real dollar value can be put on running that operator over time.
I think it’s important to give the developer control on what tags and labels can be added, so it supports their data processing setup. And so I think tags and labels should be able to be set at launch-time. (In my case, I run a python script that calls dbt run – I would want my python script to be able to set the BigQuery Job tags and labels, while the Jobs are ultimately launched by dbt run.)
Who will this benefit?
Folks who are responsible for their BigQuery spend should benefit by using relevant Job tags and labels.
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:12 (8 by maintainers)

Top Related StackOverflow Question
I’m not very familiar with the dbt internals, so it would probably take me some time to figure out, but I’d be happy to give this a try if nobody picks it up first.
This isn’t something we’re prioritizing now. FYI #2809 did add
invocation_idas a label to all dbt-bigquery jobs, starting in v0.19.0. Thatinvocation_idcan be used to queryINFORMATION_SCHEMA.JOBS_BY_*and calculate total time/spend per invocation; it can also be used to associate BigQuery query history with dbt run artifacts (docs), namelyrun_results.json. Those run artifacts will contain lots of useful metadata, including (e.g.) any environment variables prefixed withDBT_ENV_CUSTOM_ENV_.I agree that dbt should be able to pass more information than just the
invocation_id, though I think that’s a strong start. A--job-labelflag that allows the user / orchestration tool to set one value for all nodes in an invocation should be straightforward to implement. The more I think about it, though, I find it functionally limiting but also one-off as an implementation, not well integrated with existing dbt constructs that seek to accomplish the same goal.I do think the best version of this would make available the full query comment context as per-node job labels. That context, available to the
query_commentmacro, is defined inquery_headers.py. I agree with @hui-zheng, it’s quite easy to pass environment variables or--varsinto thequery-commentconfig orquery_commentmacro, so this approach would solve for both use cases we’ve been discussing.The string version of this comment—the default value, the string passed to the config, or the value returned by the custom macro—is available to the connection manager, via
set_query_headerand_add_query_comment. Theexecutemethod already calls_add_query_commentto prepend the comment to SQL before execution:https://github.com/fishtown-analytics/dbt/blob/344a14416d22f0cfbeb56b9904092c8a4f38b1fc/plugins/bigquery/dbt/adapters/bigquery/connections.py#L333-L336
So here’s what I’m thinking about:
query_commentindbt_project.yml, calledjob_label: true | false.query_comment.job_labelis turned on, and the query comment config/macro returns a dict / JSON string (such as the advanced usage example in the docs), should dbt try to parse the returned value into a python dict, and pass each key-value pair as a separate label? I think yes; this should even work for the default query comment value.query_comment.job_labelis turned on, and the query-comment returns an unstructured string, should dbt still try to pass the first 128 bytes (truncated if needed) as the value to a single label calledquery_comment? I still think yes, but I’m open to your thoughts on this point (and every point above).Having written all that out, acknowledging that there are a few tricky pieces, I do think the requisite changes would be relatively self-contained in the codebase. Would anyone be interested in giving it a go?