Logging bug in a long runs
See original GitHub issueApache Airflow version: 2.0.2
Environment: Kubernetes v1.18.3 Openshift 4.5.37
What happened:
We are running our python code in kubernetes operators(airflow.contrib.operators.kubernetes_pod_operator).
During long runs(>10h) the airflow with the logs turned on(get_logs=True
in k8s operator field) behaves absolutely normally, and then throws an unexpected error.
If we set get_logs=False
- we have success dag run, otherwise, we have the same error every time.
Logs:
> [2021-05-18 13:54:10,199] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
yield
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
self._update_chunk_length()
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 366, in execute
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in create_new_pod_for_operator
final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 145, in monitor_pod
for line in logs:
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
for chunk in self.stream(decode_content=True):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2021-05-18 13:54:10,204] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=pipline, task_id=task7, execution_date=20210518T132920, start_date=20210518T133244, end_date=20210518T135410
[2021-05-18 13:54:10,280] {local_task_job.py:146} INFO - Task exited with return code 1
We have an airflow instance on other kubernetes server, where we are able to run the same code with the same dags and get no errors.
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (11 by maintainers)
Top Results From Across the Web
Top Bug Logging Tools - Why You Need It? - Kissflow
Bug logging, otherwise known as defect logging, refers to the process of recording and monitoring software errors that are discovered during ...
Read more >How I stopped logging bugs and started living happy
In this situation, logging and managing bugs is simply waste and we all live happy with no bug ping pong between developers and...
Read more >What should be reported when a bug only happens 50% of ...
Log the bug in the tracker as intermittent. It helps the company because it may be some time before someone spots a trend...
Read more >Log4j software bug is 'severe risk' to the entire internet
A flaw in a commonly used piece of software has left millions of web servers vulnerable to exploitation by hackers.
Read more >Log4j software bug: What you need to know - CNET
The bug in the Java-logging library Apache Log4j poses risks for huge swathes of the internet. The vulnerability in the widely used software ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
And I heartily recommend “search” on Airlfow docs site. It really fast and really good:
@sg27 Because you are looking in a wrong place. This is a kubernetes provider fix, not airflow. https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html