LocalTaskJob heartbeat race condition with finishing task causing SIGTERM
See original GitHub issueApache Airflow version: 2.0.2
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS
- Kernel (e.g.
uname -a
): Linux datadumpprod2 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Install tools: Docker
What happened:
After task execution is done but process isn’t finished yet, heartbeat callback kills the process because falsely detects external change of state.
[2021-06-02 20:40:55,273] {{taskinstance.py:1532}} INFO - Marking task as FAILED. dag_id=<dag_id>, task_id=<task_id>, execution_date=20210602T104000, start_date=20210602T184050, end_date=20210602T184055
[2021-06-02 20:40:55,768] {{local_task_job.py:188}} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2021-06-02 20:40:55,770] {{process_utils.py:100}} INFO - Sending Signals.SIGTERM to GPID 2055
[2021-06-02 20:40:55,770] {{taskinstance.py:1265}} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-06-02 20:40:56,104] {{process_utils.py:66}} INFO - Process psutil.Process(pid=2055, status='terminated', exitcode=1, started='20:40:49') (2055) terminated with exit code 1
This happens more often when mini scheduler is enabled because in such case the window for race condition is bigger (time of execution mini scheduler).
What you expected to happen:
Heartbeat should allow task to finish and shouldn’t kill it.
How to reproduce it:
As it’s a race condition it happens randomly but to make it more often, you should have mini scheduler enabled and big enough database that execution of mini scheduler takes as long as possible. You can also reduce heartbeat interval to minimum.
Issue Analytics
- State:
- Created 2 years ago
- Comments:32 (18 by maintainers)
Top Results From Across the Web
subject:"\[GitHub\] \[airflow\] Prasnal commented ... - The Mail Archive
[GitHub] [airflow] Prasnal commented on issue #16227: LocalTaskJob heartbeat race condition with finishing task causing SIGTERM · 2021-06-04 Thread GitBox.
Read more >How To Fix Task received SIGTERM signal In Airflow
In today's article I will go through a few potential solutions to the SIGTERM signal that is sent to tasks, causing Airflow DAGs...
Read more >Airflow: airflow/jobs/local_task_job.py | Fossies
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Python source code syntax highlighting...
Read more >airflow local_task_job 源码 - seaxiang
__mapper_args__ = {'polymorphic_identity': 'LocalTaskJob'} def __init__( self ... SIGTERM to _run_raw_task while not self.terminating: # Monitor the task to ...
Read more >Release Notes — Airflow Documentation
Fix race condition between triggerer and scheduler (#21316) ... Show task status only for running dags or only for the last finished dag...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I see the issue has been closed, but am still experiencing the issue
@ephraimbuddy I have no dagrun_timeout