KubernetesPodOperator breaks with active log-collection for long running tasks
See original GitHub issueI’m encountering the same bug reported in https://issues.apache.org/jira/browse/AIRFLOW-3534, with airflow 1.10.12.
[2020-11-06 13:03:29,672] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:29,673] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:30,681] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:30,681] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:31,692] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:31,692] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:32,702] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Running
[2020-11-06 13:04:32,740] {taskinstance.py:1150} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
yield
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
self._update_chunk_length()
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
result = task_copy.execute(context=context)
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 284, in execute
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 403, in create_new_pod_for_operator
final_state, result = launcher.monitor_pod(pod=pod, get_logs=self.get_logs)
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/kubernetes/pod_launcher.py", line 155, in monitor_pod
for line in logs:
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
for chunk in self.stream(decode_content=True):
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/opt/bitnami/python/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-11-06 13:04:32,743] {taskinstance.py:1194} INFO - Marking task as UP_FOR_RETRY. dag_id=..., task_id=..., execution_date=20201106T120000, start_date=20201106T130329, end_date=20201106T130432
[2020-11-06 13:04:34,641] {local_task_job.py:102} INFO - Task exited with return code 1
The bug goes away by setting get_logs=False in the KubernetesPodOperator. Reproduced with multiple dags and tasks.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:27 (14 by maintainers)
Top Results From Across the Web
[jira] [Commented] (AIRFLOW-3534) KubernetesPodOperator ...
[jira] [Commented] (AIRFLOW-3534) KubernetesPodOperator breaks with active log-collection for long running tasks.
Read more >[GitHub] [airflow] cchicote commented on issue #12136
[GitHub] [airflow] cchicote commented on issue #12136: KubernetesPodOperator breaks with active log-collection for long running tasks.
Read more >Use the KubernetesPodOperator | Cloud Composer
KubernetesPodOperator launches Kubernetes pods in your environment's cluster. ... If you look at the logs, the task fails because of a Pod took...
Read more >Airflow KubernetesPodOperator Losing Connection to Worker ...
I've seen this behaviour repeatedly, and it tends to happen with longer running jobs with a gap in the log output, but I...
Read more >How to generate PDF Files from HTML In Python using PDFKIT
[GitHub] [airflow] dmateusp commented on issue #12136: KubernetesPodOperator breaks with active log-collection for long running tasks.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Can you please give me a example?
@dmateusp I found an issue with the above fix. When running with the followings arguments, only the first
print()is logged, the later ones are discarded:Log output:
Expected log output, when using the following arguments (reduced sleep time to avoid
IncompleteRead):