question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regional Dataproc Workflow Template Instantiate Fails

See original GitHub issue

Apache Airflow version: GCP Composer (1.10.12+composer)

Kubernetes version (if you are using kubernetes) (use kubectl version): GKE (1.16.13-gke.404)

Environment: GCP Composer

  • Cloud provider or hardware configuration: GCP
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened: Trying to instantiate regional workflow template using Dataproc operator fails.

[2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint. Traceback (most recent call last) File “/opt/python3.6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py”, line 57, in error_remapped_callabl return callable_(*args, **kwargs File “/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py”, line 826, in _call return _end_unary_response_blocking(state, call, False, None File “/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py”, line 729, in _end_unary_response_blockin raise _InactiveRpcError(state grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with status = StatusCode.INVALID_ARGUMEN details = “Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint. debug_error_string = “{“created”:”@1606963704.503150879”,“description”:“Error received from peer ipv4:142.250.71.74:443”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1061,“grpc_message”:“Region ‘us-central1’ specified in request does not match endpoint region ‘global’. To use ‘us-central1’ region, specify ‘us-central1’ region in request and configure client to use ‘us-central1-dataproc.googleapis.com:443’ endpoint.”,“grpc_status”:3}

What you expected to happen:

It seems the endpoint is not handled correctly and Dataproc hook uses global endpoint 1 even if the template is on a region and region parameter set.

I think this method 2 should be updated to get location/region and add client_option to the data proc grpc client.

How to reproduce it: Need GCP project setup 1- Create workflow template gcloud dataproc workflow-templates create WORKFLOW_TMPL --region us-central1

2- Schedule the following DAG:

import airflow
from airflow import DAG
from datetime import timedelta
from airflow.providers.google.cloud.operators.dataproc import DataprocInstantiateWorkflowTemplateOperator


default_args = {
    'start_date': airflow.utils.dates.days_ago(0),
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'dataproc_template_test',
    default_args=default_args,
    description='test dataproc workflow template',
    schedule_interval=None,
    dagrun_timeout=timedelta(minutes=20))

start_template_job = DataprocInstantiateWorkflowTemplateOperator(
    # The task id of your job
    task_id="dataproc_workflow_dag",
    # The template id of your workflow
    template_id="TEMPLATE_ID",
    project_id="PROJECT_ID",
    # The region for the template
    region="us-central1",
    dag=dag
)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
otourzancommented, Dec 8, 2020

I could workaround it by overriding hook methods in DAG with the following code. I’ll send a pull request to fix in main code as well.

from airflow.providers.google.cloud.hooks.dataproc import DataprocHook
from google.cloud.dataproc_v1beta2 import WorkflowTemplateServiceClient

def get_template_client(self, location=None) -> WorkflowTemplateServiceClient:
    """Returns WorkflowTemplateServiceClient."""
    client_options = {'api_endpoint': f'{location}-dataproc.googleapis.com:443'} if location and location != 'global' else None

    return WorkflowTemplateServiceClient(
        credentials=self._get_credentials(), client_info=self.client_info, client_options=client_options
    )

def instantiate_workflow_template(
        self,
        location: str,
        template_name: str,
        project_id: str,
        version=None,
        request_id=None,
        parameters=None,
        retry=None,
        timeout=None,
        metadata=None,
    ):
    client = self.get_template_client(location)
    name = client.workflow_template_path(project_id, location, template_name)
    operation = client.instantiate_workflow_template(
        name=name,
        version=version,
        parameters=parameters,
        request_id=request_id,
        retry=retry,
        timeout=timeout,
        metadata=metadata,
    )
    return operation

DataprocHook.get_template_client = get_template_client
DataprocHook.instantiate_workflow_template = instantiate_workflow_template
Read more comments on GitHub >

github_iconTop Results From Across the Web

gcloud dataproc workflow-templates instantiate
Infrastructure to run specialized workloads on Google Cloud. ... Usage recommendations for Google Cloud products and services. ... Fully managed, native VMware ...
Read more >
Instantiate Dataproc Workflow Template based on log messages
I want to trigger a cloud function to instantiate Dataproc workflow template based on stackdriver log messages (whenever the Dataproc cluster failed to ......
Read more >
Error while I create dataproc workflow template - Stack Overflow
From the error, it sounds like an organization or project admin has applied a VPC Service Controls Perimeter to your project that is ......
Read more >
Escaping comma in dataproc workflow template parameter
In all cases I get the following error: ERROR: (gcloud.dataproc.workflow-templates.instantiate) argument --parameters: Bad syntax for dict ...
Read more >
WorkflowTemplateServiceClient (Google Cloud 0.98.0-alpha ...
The "resource name" of the region, as described in ... The Dataproc workflow template to create. ... ApiException - if the remote call...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found