All Dataproc operators raise 404 error
See original GitHub issueHello,
I was using Dataproc operators and I face with the same error popping each time calling a task. Consider the simple following DAG.
from airflow import DAG
from airflow.utils.dates import days_ago
from datetime import timedelta
from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator
CLUSTER_CONFIG = {
"master_config": {
"num_instances": 1,
"machine_type_uri": "n1-standard-4",
"disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 1024},
},
"worker_config": {
"num_instances": 2,
"machine_type_uri": "n1-standard-4",
"disk_config": {"boot_disk_type": "pd-standard", "boot_disk_size_gb": 1024},
}
}
default_args = {
'owner': 'maxime',
'start_date': days_ago(2),
'retries': 0,
'retry_delay': timedelta(minutes=10),
'project_id': "driven-crawler-276317",
'region': "us-central1-a"
}
with DAG("dag_dataproc", default_args=default_args, schedule_interval=None) as dag:
task_create_dataproc = DataprocCreateClusterOperator(
task_id='create_dataproc',
cluster_name="test",
project_id="driven-crawler-276317",
region="us-central1-a",
cluster_config=CLUSTER_CONFIG
)
task_create_dataproc
Testing the task create_dataproc
(or a delete) or backfilling the entire DAG will automatically raise the following exception.
Traceback (most recent call last):
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
return callable_(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = "Received http2 header with status: 404"
debug_error_string = "{"created":"@1620129208.396761464","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":129,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/__main__.py", line 40, in main
args.func(args)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/cli.py", line 89, in wrapper
return f(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 385, in task_test
ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1393, in run
session=session,
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 603, in execute
cluster = self._create_cluster(hook)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/operators/dataproc.py", line 540, in _create_cluster
metadata=self.metadata,
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/common/hooks/base_google.py", line 425, in inner_wrapper
return func(self, *args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/airflow/providers/google/cloud/hooks/dataproc.py", line 304, in create_cluster
metadata=metadata,
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/cloud/dataproc_v1beta2/services/cluster_controller/client.py", line 429, in create_cluster
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/timeout.py", line 102, in func_with_timeout
return func(*args, **kwargs)
File "/home/maxime/Documents/Repos/Dataproc_Test/sources/airflow/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.MethodNotImplemented: 501 Received http2 header with status: 404
I have a connection configured with a full-access to GCP resources. Do any of you have already encountered the same issue ?
Apache Airflow version: 2.0.2
Environment:
- Cloud provider or hardware configuration: GCP
- OS: Ubuntu 18.04 LTS
- Kernel: Linux 5.4.0-72-generic
- Install tools:
- Others: pip freeze on Airflow :
apache-airflow==2.0.2
apache-airflow-providers-ftp==1.0.1
apache-airflow-providers-google==2.2.0
apache-airflow-providers-http==1.1.1
apache-airflow-providers-imap==1.0.1
apache-airflow-providers-sqlite==1.0.2
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
[GitHub] [airflow] MaximeJumelle opened a new issue #15652
Hello, I was using Dataproc operators and I face with the same error popping each time calling a task. Consider the simple following...
Read more >Troubleshoot Dataproc error messages - Google Cloud
Cause: This error can occur when you attempt to setup a Dataproc cluster using a VPC network in another project and the Dataproc...
Read more >Regional Dataproc Workflow Template Instantiate Fails #12804
Trying to instantiate regional workflow template using Dataproc operator fails. [2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 ...
Read more >cloud composer task unable to create dataproc cluster
I have created dataproc connection on airflow and given dataproc admin and storage admin roles to the service account. Without this connection ...
Read more >Source code for airflow.contrib.operators.dataproc_operator
The operator will wait until the creation is successful or an error occurs in the ... :param cluster_name: The name of the DataProc...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, thank you a lot for your response. This is indeed the trick. I was confused since in previous version,
zone
parameter was needed instead ofregion
. 😃I think this is related to the upgading between Airflow 1.x and Airflow 2.x, since I used the following code on Airflow 1.x.
I was confused because it asked me to put
region
instead ofzone
in versions 2.x, and I didn’t see the change between region and zone because on the example page for Dataproc, there is aZONE
variable defined (but never used).