question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DAG on_failure_callback uses wrong context

See original GitHub issue

Apache Airflow version

2.4.0

What happened

When a task fails in a DAG, the on_failure_callback registered while creating the dag is triggered using the context of a random task instance.

What you think should happen instead

The expectation is that one of the task instances that caused the dag failure should be used instead of a random task instance.

How to reproduce

Run the below dag.

import datetime
from airflow.models.dag import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.empty import EmptyOperator

def all_bad():
    raise Exception("I have failed")

def all_good():
    print("ALL GOOD")

def failure_callback_dag(context):
    print("Inside failure_callback_dag")
    print(context["task_instance"])
    print(context["task"])

with DAG(
        dag_id = "test_dag",
        schedule_interval=None,
        start_date=datetime.datetime(2021, 1, 1),
        catchup=False,
        on_failure_callback=failure_callback_dag
    ) as dag:

    start = EmptyOperator(
        task_id="start"
    )
    
    fail = PythonOperator(
        provide_context = True,
        task_id = "fail",
        python_callable = all_bad
    )

    passs = PythonOperator(
        provide_context = True,
        task_id = "pass",
        python_callable = all_good
    )

    start >> [passs, fail]

From the dag processor logs:

The context that is passed is from the task instance that has succeeded.

[2022-09-28T18:28:14.465+0000] {logging_mixin.py:117} INFO - [2022-09-28T18:28:14.463+0000] {dag.py:1292} INFO - Executing dag callback function: <function failure_callback_dag at 0x7fd17ca18560>
[2022-09-28T18:28:14.943+0000] {logging_mixin.py:117} INFO - Inside failure_callback_dag
[2022-09-28T18:28:14.944+0000] {logging_mixin.py:117} INFO - <TaskInstance: test_dag.pass manual__2022-09-28T18:27:59.612118+00:00 [success]>
[2022-09-28T18:28:14.944+0000] {logging_mixin.py:117} INFO - <Task(PythonOperator): pass>

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

Default providers that are present in the official airflow docker image.

Deployment

Docker-Compose

Deployment details

No response

Anything else

Not sure if it is an expected behaviour, incase it is it needs to be documented in Callbacks.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
uranusjrcommented, Sep 29, 2022

Perhaps we should rename the DAG-level argument to on_dag_failure_callback instead?

2reactions
josh-fellcommented, Sep 28, 2022

Yeah, looking at the Callbacks documentation, I definitely agree it’s misleading and could be improved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow on_failure_callback - Stack Overflow
I have an Airflow DAG with two tasks:.
Read more >
How can I set my on_failure and on_success callbacks to ...
Create a function that accepts one argument for the context to be passed into. For DAG callbacks, since the code is ran in...
Read more >
Callbacks — Airflow Documentation
A valuable component of logging and monitoring is the use of task callbacks to act upon changes in state of a given task,...
Read more >
Bet you didn't know this about Airflow! | by Jyoti Dhiman
Now, coming to a use case where I really dug down in airflow capabilities. ... by the upstream ETL task(in the context of...
Read more >
[airflow] 29/38: Handle and log exceptions raised during task ...
... context["exception"] = error - task.on_failure_callback(context) + ... True + + dag = DAG( + 'test_success_callback_handles_exception', ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found