question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BigQueryOperator never uses the location parameter

See original GitHub issue

Apache Airflow version: composer-1.10.4-airflow-1.10.6

Kubernetes version (if you are using kubernetes) (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.42", GitCommit:"42bef28c2031a74fc68840fce56834ff7ea08518", GitTreeState:"clean", BuildDate:"2020-06-02T16:07:00Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

What happened and What you think went wrong:

BigQueryOperator does not use the location parameter in order to specify query job location. Instead, it retrieves the automatically determined location from the HTTP request.

This happens because of the following code:

        jobs = self.service.jobs()
        job_data = {'configuration': configuration}

        # Send query and wait for reply.
        query_reply = jobs \
            .insert(projectId=self.project_id, body=job_data) \
            .execute(num_retries=self.num_retries)
        self.running_job_id = query_reply['jobReference']['jobId']
        if 'location' in query_reply['jobReference']:
            location = query_reply['jobReference']['location']
        else:
            location = self.location

The configuration block does not contain a location. The subsequent call in query_reply apparently triggers some internal BigQuery logic to detect the location. This in practice falls back to US more often than not, leading to the job to quit with an error saying the datasets/tables referenced in the query do not exist. Specifying the location argument, e.g. location='EU' in the operator is thus not obeyed.

What you expected to happen: Specifying location as a BigQueryOperator argument leads to execution of the query job in the correct location.

How to reproduce it: Set up a project and dataset in EU containing an example table.

Then, with an initialised local Airflow (airflow initdb) that has been supplied with GCP/BigQuery default connection details, you may run the following code:

from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.models import TaskInstance


dag = DAG(dag_id="anydag", start_date=datetime.now())
task = BigQueryOperator(
   dag=dag,
   task_id="query_task",
   name="query_task",
   write_disposition="WRITE_TRUNCATE",
   use_legacy_sql=False,
   destination_dataset_table=f"example_project.example_dataset.example_table",
   sql="select * from `example_project.example_dataset.example_input_table`",
   location="US"
)
ti = TaskInstance(task=task, execution_date=datetime.now())
task.execute(ti.get_template_context())

The location parameter will probably not be respected, instead your job will execute in EU.

Occasionally, regardless of location specified, your job will execute in US. This is difficult to reliably reproduce as it appears to be flaky and depend on which location the BigQuery service itself decided the query should run in.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mik-lajcommented, Jul 27, 2020

@muscovitebob please upgrade Cloud Composer to the latest version.

Old environments do not support these packages.

June 24, 2020 Airflow Providers can now be installed inside Cloud Composer.

1reaction
turbaszekcommented, Jul 27, 2020

Closing as the reported problem itself seems to be solved

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow.contrib.operators.bigquery_operator
This operator is used to patch dataset for your Project in BigQuery. It only replaces fields that are provided in the submitted dataset...
Read more >
Query is not getting passed to BigQuery using ...
I am facing challenges while posting query to BigQuery using BigQueryOperator. Version Of Airflow: 1.10.6. piece of code
Read more >
BigQueryExecuteQueryOperator - Astronomer Registry
Executes BigQuery SQL queries in a specific BigQuery database. This operator does not assert idempotency.
Read more >
Source code for airflow.contrib.operators.bigquery_operator
[docs]class BigQueryOperator(BaseOperator): """ Executes BigQuery SQL queries in a specific BigQuery database :param bql: (Deprecated. Use `sql` parameter ...
Read more >
Job | BigQuery - Google Cloud
For standard SQL queries, this flag is ignored and results are never flattened. ... Set to POSITIONAL to use positional (?) query parameters...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found