question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LocalTaskJob heartbeat race condition with finishing task causing SIGTERM

See original GitHub issue

Apache Airflow version: 2.0.2

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS
  • Kernel (e.g. uname -a): Linux datadumpprod2 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: Docker

What happened:

After task execution is done but process isn’t finished yet, heartbeat callback kills the process because falsely detects external change of state.

[2021-06-02 20:40:55,273] {{taskinstance.py:1532}} INFO - Marking task as FAILED. dag_id=<dag_id>, task_id=<task_id>, execution_date=20210602T104000, start_date=20210602T184050, end_date=20210602T184055
[2021-06-02 20:40:55,768] {{local_task_job.py:188}} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2021-06-02 20:40:55,770] {{process_utils.py:100}} INFO - Sending Signals.SIGTERM to GPID 2055
[2021-06-02 20:40:55,770] {{taskinstance.py:1265}} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-06-02 20:40:56,104] {{process_utils.py:66}} INFO - Process psutil.Process(pid=2055, status='terminated', exitcode=1, started='20:40:49') (2055) terminated with exit code 1

This happens more often when mini scheduler is enabled because in such case the window for race condition is bigger (time of execution mini scheduler).

What you expected to happen:

Heartbeat should allow task to finish and shouldn’t kill it.

How to reproduce it:

As it’s a race condition it happens randomly but to make it more often, you should have mini scheduler enabled and big enough database that execution of mini scheduler takes as long as possible. You can also reduce heartbeat interval to minimum.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:32 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
mwaaascommented, Aug 3, 2021

I see the issue has been closed, but am still experiencing the issue

1reaction
millincommented, Jun 4, 2021

@ephraimbuddy I have no dagrun_timeout

Read more comments on GitHub >

github_iconTop Results From Across the Web

subject:"\[GitHub\] \[airflow\] Prasnal commented ... - The Mail Archive
[GitHub] [airflow] Prasnal commented on issue #16227: LocalTaskJob heartbeat race condition with finishing task causing SIGTERM · 2021-06-04 Thread GitBox.
Read more >
How To Fix Task received SIGTERM signal In Airflow
In today's article I will go through a few potential solutions to the SIGTERM signal that is sent to tasks, causing Airflow DAGs...
Read more >
Airflow: airflow/jobs/local_task_job.py | Fossies
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Python source code syntax highlighting...
Read more >
airflow local_task_job 源码 - seaxiang
__mapper_args__ = {'polymorphic_identity': 'LocalTaskJob'} def __init__( self ... SIGTERM to _run_raw_task while not self.terminating: # Monitor the task to ...
Read more >
Release Notes — Airflow Documentation
Fix race condition between triggerer and scheduler (#21316) ... Show task status only for running dags or only for the last finished dag...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found