Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0
See original GitHub issueApache Airflow version: 2.0.0
Kubernetes version (if you are using kubernetes) (use kubectl version
): 1.19.4
Environment:
- Cloud provider or hardware configuration: Google Cloud Platform/GKE
What happened:
I successfully cleared the state of a failed task using the graph view UI, but when I attempted to re-run the cleared task instance in graph view manually by selecting the task instance and clicking “Run”, I received the following error:
Something bad has happened.
Please consider letting us know by creating a bug report using GitHub.
Python version: 3.8.7
Airflow version: 2.0.0
Node: mr_node
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1366, in run
executor.start()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 493, in start
raise AirflowException("Could not get scheduler_job_id")
airflow.exceptions.AirflowException: Could not get scheduler_job_id
What you expected to happen:
I expected the task instance to be scheduled and begin running again.
How to reproduce it:
Configure Airflow 2.0.0 to run on GCP, clear the state of a finished task instance using the UI (I was able to reproduce the error on a task instance maked “Success” as well), and again use the Web UI to “Run” the task.
Anything else we need to know:
One important item to note is that when I only clear task instance and do not attempt to run it manually using the UI, the task does queue and is placing in a running
state, but quickly fails with the following error:
[2021-01-08 21:08:16,140] {taskinstance.py:1396} ERROR - (0)
Reason: Handshake status 500 Internal Server Error
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 296, in websocket_call
client = WSClient(configuration, get_websocket_url(url), headers, capture_all)
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 94, in __init__
self.sock.connect(url, header=header)
File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_core.py", line 226, in connect
self.handshake_response = handshake(self.sock, *addrs, **options)
File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 80, in handshake
status, resp = _get_resp_headers(sock)
File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 165, in _get_resp_headers
raise WebSocketBadStatusException("Handshake status %d %s", status, status_message, resp_headers)
websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 335, in execute
final_state, result = self.handle_pod_overlap(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 375, in handle_pod_overlap
final_state, result = self.monitor_launched_pod(launcher, pod)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in monitor_launched_pod
(final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 151, in monitor_pod
result = self._extract_xcom(pod)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 246, in _extract_xcom
resp = kubernetes_stream(
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 35, in stream
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 841, in connect_get_namespaced_pod_exec
(data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 927, in connect_get_namespaced_pod_exec_with_http_info
return self.api_client.call_api(
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api
return self.__call_api(resource_path, method,
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api
response_data = self.request(
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 30, in _intercept_request_call
return ws_client.websocket_call(config, *args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 302, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 500 Internal Server Error
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (7 by maintainers)
Looks like a bug with Kubernetes Executor. Related issue: https://github.com/apache/airflow/issues/13805
Duplicate of https://github.com/apache/airflow/issues/13805 and closed by https://github.com/apache/airflow/pull/14160