Regional Dataproc Workflow Template Instantiate Fails
See original GitHub issueApache Airflow version: GCP Composer (1.10.12+composer)
Kubernetes version (if you are using kubernetes) (use kubectl version
): GKE (1.16.13-gke.404)
Environment: GCP Composer
- Cloud provider or hardware configuration: GCP
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened: Trying to instantiate regional workflow template using Dataproc operator fails.
[2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint. Traceback (most recent call last) File “/opt/python3.6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py”, line 57, in error_remapped_callabl return callable_(*args, **kwargs File “/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py”, line 826, in _call return _end_unary_response_blocking(state, call, False, None File “/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py”, line 729, in _end_unary_response_blockin raise _InactiveRpcError(state grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with status = StatusCode.INVALID_ARGUMEN details = “Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint. debug_error_string = “{“created”:”@1606963704.503150879”,“description”:“Error received from peer ipv4:142.250.71.74:443”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1061,“grpc_message”:“Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint.”,“grpc_status”:3}
What you expected to happen:
It seems the endpoint is not handled correctly and Dataproc hook uses global endpoint 1 even if the template is on a region and region parameter set.
I think this method 2 should be updated to get location/region and add client_option to the data proc grpc client.
How to reproduce it:
Need GCP project setup
1- Create workflow template
gcloud dataproc workflow-templates create WORKFLOW_TMPL --region us-central1
2- Schedule the following DAG:
import airflow
from airflow import DAG
from datetime import timedelta
from airflow.providers.google.cloud.operators.dataproc import DataprocInstantiateWorkflowTemplateOperator
default_args = {
'start_date': airflow.utils.dates.days_ago(0),
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'dataproc_template_test',
default_args=default_args,
description='test dataproc workflow template',
schedule_interval=None,
dagrun_timeout=timedelta(minutes=20))
start_template_job = DataprocInstantiateWorkflowTemplateOperator(
# The task id of your job
task_id="dataproc_workflow_dag",
# The template id of your workflow
template_id="TEMPLATE_ID",
project_id="PROJECT_ID",
# The region for the template
region="us-central1",
dag=dag
)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
I could workaround it by overriding hook methods in DAG with the following code. I’ll send a pull request to fix in main code as well.
@DenisOgr Run
pip install -U apache-airflow-providers-google
Docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/index.html Changes: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/commits.html