question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All Dataproc operators raise 404 error

See original GitHub issue

Hello,

I was using Dataproc operators and I face with the same error popping each time calling a task. Consider the simple following DAG.

from airflow import DAG
from airflow.utils.dates import days_ago
from datetime import timedelta

from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator

CLUSTER_CONFIG = {
    "master_config": {
        "num_instances": 1,
        "machine_type_uri": "n1-standard-4",
        "disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 1024},
    },
    "worker_config": {
        "num_instances": 2,
        "machine_type_uri": "n1-standard-4",
        "disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 1024},
    }
}

default_args = {
    'owner': 'maxime',
    'start_date': days_ago(2),
    'retries': 0,
    'retry_delay': timedelta(minutes=10),
    'project_id': "driven-crawler-276317",
    'region': "us-central1-a"
}

with DAG("dag_dataproc", default_args=default_args, schedule_interval=None) as dag:

    task_create_dataproc = DataprocCreateClusterOperator(
        task_id='create_dataproc',
        cluster_name="test",
        project_id="driven-crawler-276317",
        region="us-central1-a",
        cluster_config=CLUSTER_CONFIG
    )

    task_create_dataproc

Testing the task create_dataproc (or a delete) or backfilling the entire DAG will automatically raise the following exception.

Traceback (most recent call last):
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNIMPLEMENTED
        details = "Received http2 header with status: 404"
        debug_error_string = "{"created":"@1620129208.396761464","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":129,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 385, in task_test
    ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1393, in run
    session=session,
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
    result = task_copy.execute(context=context)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 603, in execute
    cluster = self._create_cluster(hook)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 540, in _create_cluster
    metadata=self.metadata,
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/common/hooks/base_google.py", line 425, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/hooks/dataproc.py", line 304, in create_cluster
    metadata=metadata,
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/cloud/dataproc_v1beta2/services/cluster_controller/client.py", line 429, in create_cluster
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/timeout.py", line 102, in func_with_timeout
    return func(*args, **kwargs)
  File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.MethodNotImplemented: 501 Received http2 header with status: 404

I have a connection configured with a full-access to GCP resources. Do any of you have already encountered the same issue ?

Apache Airflow version: 2.0.2

Environment:

  • Cloud provider or hardware configuration: GCP
  • OS: Ubuntu 18.04 LTS
  • Kernel: Linux 5.4.0-72-generic
  • Install tools:
  • Others: pip freeze on Airflow :
apache-airflow==2.0.2
apache-airflow-providers-ftp==1.0.1
apache-airflow-providers-google==2.2.0
apache-airflow-providers-http==1.1.1
apache-airflow-providers-imap==1.0.1
apache-airflow-providers-sqlite==1.0.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
MaximeJumellecommented, May 4, 2021

Hi, thank you a lot for your response. This is indeed the trick. I was confused since in previous version, zone parameter was needed instead of region. 😃

0reactions
MaximeJumellecommented, May 4, 2021

I think this is related to the upgading between Airflow 1.x and Airflow 2.x, since I used the following code on Airflow 1.x.

task_create_dataproc = DataprocClusterCreateOperator(
    task_id='create_dataproc',
    cluster_name="cluster-{{ ds_nodash }}",
    num_workers=2,
    zone="us-central1-a",
    master_machine_type='n1-standard-4',
    worker_machine_type='n1-standard-4',
    idle_delete_ttl=3600,
    dag=dag
)

I was confused because it asked me to put region instead of zone in versions 2.x, and I didn’t see the change between region and zone because on the example page for Dataproc, there is a ZONE variable defined (but never used).

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [airflow] MaximeJumelle opened a new issue #15652
Hello, I was using Dataproc operators and I face with the same error popping each time calling a task. Consider the simple following...
Read more >
Troubleshoot Dataproc error messages - Google Cloud
Cause: This error can occur when you attempt to setup a Dataproc cluster using a VPC network in another project and the Dataproc...
Read more >
Regional Dataproc Workflow Template Instantiate Fails #12804
Trying to instantiate regional workflow template using Dataproc operator fails. [2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 ...
Read more >
cloud composer task unable to create dataproc cluster
I have created dataproc connection on airflow and given dataproc admin and storage admin roles to the service account. Without this connection ...
Read more >
Source code for airflow.contrib.operators.dataproc_operator
The operator will wait until the creation is successful or an error occurs in the ... :param cluster_name: The name of the DataProc...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found