BigQueryHook `create_empty_dataset` missing `datasetReference`
See original GitHub issueApache Airflow version: 2.0.2
Kubernetes version (if you are using kubernetes) (use kubectl version
):
1.19
Environment:
- Cloud provider or hardware configuration: GKE
- OS (e.g. from /etc/os-release): Debian GNU/Linux 10 (buster)
- Kernel (e.g.
uname -a
): x86_64 GNU/Linux - Install tools:
- Others:
What happened:
Using the dataset_reference
argument for BigQueryCreateEmptyDatasetOperator
to set table expiration throws an error:
[2021-05-27 00:03:23,999] {taskinstance.py:1455} ERROR - 'datasetReference'
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 1419, in execute
bq_hook.create_empty_dataset(
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 425, in inner_wrapper
return func(self, *args, **kwargs)
File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 414, in create_empty_dataset
specified_param = dataset_reference["datasetReference"].get(param)
KeyError: 'datasetReference'
What you expected to happen:
I expected no error, the operator shows a similar dataset_reference
dict in its documentation:
create_new_dataset = BigQueryCreateEmptyDatasetOperator(
dataset_id='new-dataset',
project_id='my-project',
dataset_reference={"friendlyName": "New Dataset"},
gcp_conn_id='_my_gcp_conn_',
task_id='newDatasetCreator',
dag=dag
)
How to reproduce it:
create_dataset = BigQueryCreateEmptyDatasetOperator(
task_id='create_dataset',
project_id=PROJECT,
dataset_id=DATASET,
dataset_reference={"defaultTableExpirationMs": str(1000 * 60 * 60 * 24 * 30)},
dag=dag
)
Anything else we need to know:
The create_empty_dataset
method from the BigQueryHook
class expects datasetReference
to always be a key in the dictionary:
I was able to fix the error adding it:
create_dataset = BigQueryCreateEmptyDatasetOperator(
task_id='create_dataset',
project_id=PROJECT,
dataset_id=DATASET,
dataset_reference={"datasetReference": {}, "defaultTableExpirationMs": str(1000 * 60 * 60 * 24 * 30)},
dag=dag
)
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
airflow.providers.google.cloud.hooks.bigquery
This module contains a BigQuery Hook, as well as a very basic PEP 249 ... allow_jagged_rows (bool) – Accept rows that are missing...
Read more >Class DatasetReference (3.4.0) | Python client library
Construct a dataset reference from dataset ID string. Parameters. Name, Description. dataset_id, str. A dataset ID in standard SQL format. If default_project ...
Read more >Airflow BigQueryHook ValueError: The project_id should be set
Got the same ValueError. I went through every piece of documentation and still couldn't find any solution maybe I am missing something? Airflow ......
Read more >BigQuery hook doesn't work fully for BigQuery dataset in ...
We were using cloud composer to do a log data load jobs. Recently we started to using it with BigQuery dataset that's not...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@eladkal - please, assign this item to me - I will try to fix it.
@g-saxena assigned