question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BigQueryInsertJobOperator fails when there are templated variables in default args

See original GitHub issue

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==6.8.0

Apache Airflow version

2.2.3

Operating System

n/a

Deployment

Composer

Deployment details

Hi! I’m using composer-2.0.6-airflow-2.2.3 - it’s a Public IP environment without any configuration overrides. This is a super basic sandbox environment I use for testing.

What happened

I was experimenting with the BigQueryInsertJobOperator and had a failure when I tried to utilize Airflow variables within a Job configuration. Error

google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/%7B%7Bvar.value.gcp_project%7D%7D/jobs?prettyPrint=false: Invalid project ID '{{var.value.gcp_project}}'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.

DAG pseudocode (I copy pasted the relevant bits of my DAG)

  • BQ_DESTINATION_TABLE_NAME and BQ_DESTINATION_DATASET_NAME are strings, not Airflow variables, so they’re doing great. WEATHER_HOLIDAYS_JOIN_QUERY is a SQL query also defined as a string and as far as I can tell is also doing great. PROJECT_NAME is using a templated Airflow variable that is defined and is successfully being used in other operators in this and other DAGs.
PROJECT_NAME = '{{var.value.gcp_project}}'
 bq_join_holidays_weather_data = bigquery.BigQueryInsertJobOperator(
        task_id="bq_join_holidays_weather_data",
        configuration={
            "query": {
                "query": WEATHER_HOLIDAYS_JOIN_QUERY,
                "useLegacySql": False,
                "destinationTable": {
                        "projectId": PROJECT_NAME,
                        "datasetId": BQ_DESTINATION_DATASET_NAME,
                        "tableId": BQ_DESTINATION_TABLE_NAME
                    }
            }
        },
        location="US", 
    )

Some things I tried/researched I experimented a little bit with adding "configuration.query.destination_table": "json" to this line but did not have success. Additionally, I checked out the DataprocSubmitJobOperator to see if I could find some clues because I know Dataproc configurations also often have many nested dictionaries and I’m like 90% certain I’ve templated values there. I had to timebox this though because I do have a workaround (just not using the Airflow variable) and I thought I’d open an issue to see if someone who is more familiar with the underlying template rendering might be able to more easily decipher what’s happening

What you think should happen instead

I think that I should be allowed to use an Airflow variable here 😁

How to reproduce

Run a query job using the BigQueryInsertJobOperator that writes the query to a destination table with a fully qualified TableReference object and pass in the projectId parameter as a templated Airflow variable, also having project as a default arg pointing to a templated variable

Anything else

I am willing to submit a PR, but if someone else also wants to, they might be faster than I will, especially between now and the summit

Also, it’s been awhile since I submitted an issue and this form is INCREDIBLE well done friends

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
potiukcommented, May 9, 2022

Also @leahecole TIL - while answering someone’s question.

I think you COULD use user-defined macros to achieve what you want: user_defined_macros at DAG level

user_defined_macros = { "project_id" : PROJECT_ID }

....


task_1( project_id = '{{ project_id }}')

...
task_2( project_id = '{{ project_id }}')

1reaction
eladkalcommented, Jun 30, 2022

Sure we can do that. will you start a PR? In the meantime I’m closing the issue as there is no bug to fix 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do you pass variables with BigQueryInsertJobOperator in ...
I am trying to pass variable names into a sql file, but I am getting an error. The error is slightly different based...
Read more >
Writing DAGs (workflows) | Cloud Composer - Google Cloud
BigQueryInsertJobOperator ( ... Airflow tasks can fail for multiple reasons. ... How do I know which operator arguments support template substitution?
Read more >
Templates reference — Airflow Documentation
Variables ¶. The Airflow engine passes a few variables by default that are accessible in all templates. Variable.
Read more >
GCP Airflow template_searchpath Explained - Medium
Set default argument variable. initiate the DAG variable for all the task ... of BigQueryInsertJobOperator, I have used the jinja template to refer...
Read more >
apache-airflow-providers-google 8.6.0 - PyPI
Add project_id as a templated variable in two BQ operators (#24768) ... 'region' parameter has no default value affected functions/classes: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found