question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

See original GitHub issue

Apache Airflow version: 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.19.4

Environment:

  • Cloud provider or hardware configuration: Google Cloud Platform/GKE

What happened:

I successfully cleared the state of a failed task using the graph view UI, but when I attempted to re-run the cleared task instance in graph view manually by selecting the task instance and clicking “Run”, I received the following error:

Something bad has happened.
Please consider letting us know by creating a bug report using GitHub.

Python version: 3.8.7
Airflow version: 2.0.0
Node: mr_node
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1366, in run
    executor.start()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 493, in start
    raise AirflowException("Could not get scheduler_job_id")
airflow.exceptions.AirflowException: Could not get scheduler_job_id

What you expected to happen:

I expected the task instance to be scheduled and begin running again.

How to reproduce it:

Configure Airflow 2.0.0 to run on GCP, clear the state of a finished task instance using the UI (I was able to reproduce the error on a task instance maked “Success” as well), and again use the Web UI to “Run” the task.

Anything else we need to know:

One important item to note is that when I only clear task instance and do not attempt to run it manually using the UI, the task does queue and is placing in a running state, but quickly fails with the following error:

[2021-01-08 21:08:16,140] {taskinstance.py:1396} ERROR - (0)
Reason: Handshake status 500 Internal Server Error
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 296, in websocket_call
    client = WSClient(configuration, get_websocket_url(url), headers, capture_all)
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 94, in __init__
    self.sock.connect(url, header=header)
  File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_core.py", line 226, in connect
    self.handshake_response = handshake(self.sock, *addrs, **options)
  File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 80, in handshake
    status, resp = _get_resp_headers(sock)
  File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 165, in _get_resp_headers
    raise WebSocketBadStatusException("Handshake status %d %s", status, status_message, resp_headers)
websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 335, in execute
    final_state, result = self.handle_pod_overlap(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 375, in handle_pod_overlap
    final_state, result = self.monitor_launched_pod(launcher, pod)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in monitor_launched_pod
    (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 151, in monitor_pod
    result = self._extract_xcom(pod)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 246, in _extract_xcom
    resp = kubernetes_stream(
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 35, in stream
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 841, in connect_get_namespaced_pod_exec
    (data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 927, in connect_get_namespaced_pod_exec_with_http_info
    return self.api_client.call_api(
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api
    return self.__call_api(resource_path, method,
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api
    response_data = self.request(
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 30, in _intercept_request_call
    return ws_client.websocket_call(config, *args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 302, in websocket_call
    raise ApiException(status=0, reason=str(e))
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 500 Internal Server Error

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kaxilcommented, Jan 21, 2021

Looks like a bug with Kubernetes Executor. Related issue: https://github.com/apache/airflow/issues/13805

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [airflow] mfjackson opened a new issue #13579
**Apache Airflow version**: 2.0.0 **Kubernetes version (if you are ... task using the graph view UI, but when I attempted to re-run the ......
Read more >
Upgrading to Airflow 2.0+
Step 5: Upgrade Airflow DAGs¶ · Change to undefined variable handling in templates · Changes to the KubernetesPodOperator · Change default value for ......
Read more >
airflow upgrade 2.0 kubernetes_pod_operator not working
The problem has to do with the email_on_success parameter: as you can see in the BaseOperator documentation, only email_on_retry and ...
Read more >
Use the KubernetesPodOperator | Cloud Composer
Launching Kubernetes pods into the environment cluster can cause competition for cluster resources, such as CPU or memory. Because the Airflow scheduler and ......
Read more >
airflow.contrib.operators.kubernetes_pod_operator
"""Executes task in a Kubernetes POD""" import re import yaml from airflow.exceptions ... raise AirflowException( 'More than one pod running with labels: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found